Install
openclaw skills install pdf-to-word-docxPDF to Word converts PDF to editable Word/DOCX with AI-powered layout analysis and table recognition, built on ComPDF Conversion SDK to better preserve tables, multi-column layouts, lists, and images for downstream editing. It fits requests such as “pdf to word,” “convert pdf to docx,” “pdf to editable word,” “pdf to office,” “keep layout in word,” and “convert report to docx.” Example queries include “Convert this PDF contract to editable Word while keeping the tables intact,” “Turn this report into DOCX and preserve the multi-column layout,” and “Export this PDF to Word for further editing.”
openclaw skills install pdf-to-word-docxComPDFKitConversion Python SDK into a reusable local conversion workflow, supporting PDF / image to Word, PPT, Excel, HTML, RTF, Image, TXT, JSON, Markdown, and CSV (10 output formats in total).pdf-to-word-docx/.SKILL.md; helper scripts are placed in scripts/.$ARGUMENTS and ${CLAUDE_SKILL_DIR} conventions for distribution and execution in Claude Code / Agent Skills-compatible environments.word/excel/ppt/html/rtf/image/txt/json/markdown/csv), the PDF or image path, and the output path are passed via Skill arguments or the command line. An optional PDF password and conversion parameters may also be provided..pdf).jpg/.jpeg/.png/.bmp/.tif/.tiff/.webp/.jp2/.gif/.tga).docx, .pptx, .xlsx, .html, .rtf, image, .txt, .json, .md, .csv), or a clear error message.pip install ComPDFKitConversion
license.xml from the ComPDF server and caches it in the scripts/ directory:
https://download.compdf.com/skills/license/license.xml
<key>...</key> field from license.xml and uses that key for LibraryManager.license_verify(...) authentication — it does not pass the XML file path directly to the SDK.license.xml in the scripts/ directory; the script will use it directly without downloading.resource directory is always set to the directory containing pdf-to-word-docx.py, i.e., the scripts/ directory itself.--enable-ocr or --enable-ai-layout (enabled by default) is used, the Skill also requires scripts/documentai.model. If the file does not exist, the script will automatically download it from:
https://download.compdf.com/skills/model/documentai.model
export COMPDF_DOCUMENT_AI_MODEL="/path/to/documentai.model"
python -m pip show ComPDFKitConversion
license.xml on first run; the scripts/ directory is used directly as the SDK resource path.python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.pdf output.docx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" ppt input.pdf output.pptx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" excel input.pdf output.xlsx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" excel input.pdf output.xlsx --page-ranges "1-3,5" --excel-all-content --excel-worksheet-option for-page
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.pdf output.docx --enable-ocr --page-layout-mode flow
scripts/license.xml exists (downloading it automatically from the ComPDF server if missing), reads the <key> field for SDK authentication, and uses the scripts/ directory as the resource path.--enable-ocr or --enable-ai-layout (enabled by default) is active, the script checks whether scripts/documentai.model exists; if not, it downloads the file automatically before initializing the Document AI model.SUCCESS, handle license, password, resource, model, or input file issues according to the error name.COMPDF_DOCUMENT_AI_MODEL.scripts/documentai.model.documentai.model.part and then atomically renamed to the final file upon success, preventing partial file corruption.2s / 5s / 10s./pdf-to-word-docx word input.pdf output.docx
/pdf-to-word-docx excel input.pdf output.xlsx --excel-worksheet-option for-page
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" $ARGUMENTS
word → calls CPDFConversion.start_pdf_to_wordexcel → calls CPDFConversion.start_pdf_to_excelppt → calls CPDFConversion.start_pdf_to_ppthtml → calls CPDFConversion.start_pdf_to_htmlrtf → calls CPDFConversion.start_pdf_to_rtfimage → calls CPDFConversion.start_pdf_to_imagetxt → calls CPDFConversion.start_pdf_to_txtjson → calls CPDFConversion.start_pdf_to_jsonmarkdown → calls CPDFConversion.start_pdf_to_markdowncsv → reuses CPDFConversion.start_pdf_to_excel with table/Excel parameters to produce CSV-friendly outputstart_pdf_to_* interfaces natively accept image files with no pre-processing required..pdf → pdf.png/.jpg/.jpeg/.bmp/.tif/.tiff/.gif/.webp/.tga → imagepython "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.png output.docx --source-type image
image -> * and pdf -> * share the same set of CPDFConversion.start_pdf_to_* interfaces; only the input file type differs.The script automatically adjusts certain parameters based on the input source and output format to reduce manual configuration:
| Trigger | Automatic Behavior | User-Overridable | Description |
|---|---|---|---|
Input source is an image (auto-detected or explicit --source-type image) | Automatically enables --enable-ocr | No (--enable-ocr uses store_true; there is no --no-enable-ocr) | Text in images must be extracted via OCR; without OCR, output will contain only images and no text |
Output format is HTML (format = html) | Automatically sets --page-layout-mode to box (box layout) | Yes — passing --page-layout-mode flow explicitly overrides this | Box layout better preserves the original formatting in HTML; specify flow explicitly if flow layout is needed |
When triggered, the script prints a notice to stderr, for example:
Auto-enabled OCR for image input.
Auto-set page layout mode to BOX for HTML output.
| Parameter | Description |
|---|---|
format | Target format: word/excel/ppt/html/rtf/image/txt/json/markdown/csv |
input_pdf | Input file path (PDF or image) |
output_path | Output file path |
| Parameter | Type | Default | Description |
|---|---|---|---|
--source-type | Option | auto | Input source type: auto/pdf/image |
--password | String | "" | PDF open password |
--page-ranges | String | None | Page range, e.g. 1-3,5 |
--font-name | String | "" | Output font name |
| Parameter | Type | Default | Description |
|---|---|---|---|
--enable-ai-layout | Boolean | True | AI layout analysis (disable with --no-enable-ai-layout) |
--page-layout-mode | Option | SDK default flow (auto-switched to box for HTML output) | Page layout: box (box layout) / flow (flow layout) |
| Parameter | Type | Default | Description |
|---|---|---|---|
--contain-image | Boolean | True | Retain images (disable with --no-contain-image) |
--contain-annotation | Boolean | True | Retain annotations (disable with --no-contain-annotation) |
--contain-page-background-image | Boolean | True | Retain page background images (disable with --no-contain-page-background-image) |
--formula-to-image | Boolean | False | Convert formulas to image output |
--transparent-text | Boolean | False | Preserve transparent text |
| Parameter | Type | Default | Description |
|---|---|---|---|
--output-document-per-page | Boolean | False | Split output into one document per page |
--auto-create-folder | Boolean | True | Automatically create output directory (disable with --no-auto-create-folder) |
| Parameter | Type | Default | Description |
|---|---|---|---|
--enable-ocr | Boolean | False (auto-enabled for image input) | Enable OCR |
--ocr-option | Option | SDK default all | OCR scope: invalid-character/scan-page/invalid-character-and-scan-page/all |
--ocr-language | Multi-select | auto | OCR language(s); multiple languages can be specified simultaneously. Options: auto/chinese/chinese-tra/english/korean/japanese/latin/devanagari/cyrillic/arabic/tamil/telugu/kannada/thai/greek/eslav |
| Parameter | Type | Default | Description |
|---|---|---|---|
--excel-all-content | Boolean | False | Include all content in Excel output |
--excel-csv-format | Boolean | False | Output Excel result in CSV format |
--excel-worksheet-option | Option | SDK default for-table | Worksheet split strategy: for-table/for-page/for-document |
| Parameter | Type | Default | Description |
|---|---|---|---|
--json-contain-table | Boolean | True | Include table data in JSON output (disable with --no-json-contain-table) |
| Parameter | Type | Default | Description |
|---|---|---|---|
--txt-table-format | Boolean | True | Enable table formatting in TXT output (disable with --no-txt-table-format) |
| Parameter | Type | Default | Description |
|---|---|---|---|
--html-option | Option | SDK default single-page | HTML output mode: single-page/single-page-with-bookmark/multiple-page/multiple-page-with-bookmark |
| Parameter | Type | Default | Description |
|---|---|---|---|
--image-type | Option | SDK default jpg | Image output format: jpg/jpeg/jpeg2000/png/bmp/tiff/tga/gif/webp |
--image-color-mode | Option | SDK default color | Image color mode: color/gray/binary |
--image-scaling | Float | 1.0 | Image scaling factor |
--image-path-enhance | Boolean | False | Enable image path enhancement |
--enable-ai-layout/--contain-image/--contain-annotation/--contain-page-background-image/--auto-create-folder/--json-contain-table/--txt-table-format) use BooleanOptionalAction; pass --no-xxx to disable.--enable-ocr/--formula-to-image/--transparent-text/--output-document-per-page/--excel-all-content/--excel-csv-format/--image-path-enhance) use store_true; passing the flag enables them.ConvertOptions() defaults — omitting a parameter is equivalent to using the SDK's original default value.python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.pdf output.docx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.pdf output.docx --no-enable-ai-layout --no-contain-image --page-layout-mode box
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.pdf output.docx --output-document-per-page
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" excel input.pdf output.xlsx --excel-all-content --excel-worksheet-option for-page
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" txt input.pdf output.txt
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" html input.pdf output_dir --html-option multiple-page-with-bookmark
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" image input.pdf output.png --image-type png --image-color-mode gray --image-scaling 2.0
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.png output.docx --ocr-language chinese
Note: For image input, the script automatically enables OCR — there is no need to pass
--enable-ocrmanually. To specify an OCR language,--ocr-languagecan still be used.
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.pdf output.docx --enable-ocr --ocr-language chinese english japanese
scripts/license.xml auto-downloaded from the ComPDF server is a Trial License, allowing a maximum of 200 conversions.stderr, for example:
Trial license: 5/200 conversions used, 195 remaining.
Error: Trial license usage limit reached (200 conversions). Please purchase a license at: https://www.compdf.com/contact-sales
license.xml containing the new <key> in scripts/ (overwriting the auto-downloaded trial file) — no script modifications or counter file cleanup are required.ComPDFKitConversion has been successfully installed on the local machine.CPDFConversion.start_pdf_to_word/start_pdf_to_ppt/start_pdf_to_excel.LibraryManager provides initialize, license_verify, release, set_document_ai_model, and set_ocr_language.start_pdf_to_* interfaces natively accept image file input (PNG → Word has been verified successfully).enable_ai_layout defaults to True in the SDK; set_document_ai_model() must be called first to load the model before use, otherwise a 0xC0000005 crash will occur.--ocr-language supports specifying multiple languages simultaneously (e.g. --ocr-language chinese english).>=3.6, while the demo page states <3.11, but PyPI currently provides a cp314 wheel in practice; treat the locally installable wheel as the source of truth, but always verify installation in a new environment first.license.xml from the server (network issue) and no manual file exists in scripts/, or the <key> field is empty, the script cannot complete SDK authentication and cannot perform any real conversions.documentai.model is a large file (approximately 525 MB); there will be a noticeable download delay the first time OCR / AI layout is enabled. Because --enable-ai-layout defaults to True, the model download will be triggered on the very first run.https://download.compdf.com/skills/model/documentai.model, place documentai.model in the scripts/ directory in advance.LibraryManager / CPDFConversion API.License.txtscripts/pdf-to-word-docx.pyscripts/license.xml (auto-downloaded from https://download.compdf.com/skills/license/license.xml if missing)<key> field in license.xmlscripts/scripts/documentai.model (auto-downloaded if missing)https://www.compdf.com/contact-saleshttps://www.compdf.com/guides/conversion-sdk/python/overviewhttps://www.compdf.com/guides/conversion-sdk/python/pdf-to-wordhttps://www.compdf.com/guides/conversion-sdk/python/pdf-to-excelhttps://www.compdf.com/guides/conversion-sdk/python/pdf-to-ppthttps://www.compdf.com/guides/conversion-sdk/python/apply-licensepython -m pip show ComPDFKitConversion shows the installed packagepython "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" --help or an equivalent local command produces normal outputscripts/license.xml if missing, then extracts the license key from the <key> field for authenticationscripts/ directory as the SDK resource pathword/excel/ppt/html/rtf/image/txt/json/markdown/csv.png/.jpg/.jpeg/.bmp/.tif/.tiff/.gif/.webp/.tga) as input--enable-ocr or --enable-ai-layout (enabled by default) is active and documentai.model is missing, the script auto-downloads the modellicense.xml cannot be obtained (download fails and no manual file exists) or authentication fails, a clear error is output rather than a silent failure--no-xxx--ocr-language supports specifying multiple languages simultaneously--enable-ocr is not passed, the script automatically enables OCR and prints a notice to stderr--page-layout-mode is not passed, the script automatically uses box (box layout) and prints a notice to stderr--page-layout-mode flow overrides the automatic box layout behaviorpdf-to-word-docx/
├── SKILL.md
├── License.txt
└── scripts/
└── pdf-to-word-docx.py
license.xml is auto-downloaded at runtime; no need to include it in the distribution package.scripts/license.xml is missing and cannot be auto-downloaded (network unavailable or server error): the script will error out before authentication. If you are in an offline environment, place license.xml manually in the scripts/ directory.scripts/license.xml is missing the <key> field or its value is empty: the script will error out before authentication.scripts/ directory: conversion may fail after LibraryManager.initialize().--password: this will trigger PDF_PASSWORD_ERROR.documentai.model is not present locally and the network is unavailable: the model download will fail; place the file in the scripts/ directory manually in advance.--excel-worksheet-option explicitly to avoid unexpected result structures.documentai.model is complete and whether the OCR language matches.https://www.compdf.com/contact-sales.This Skill is built on top of the ComPDFKit Conversion SDK.
© 2014-2026 PDF Technologies, Inc., a KDAN Company. All Rights Reserved.
Important: Under the ComPDFKit Terms of Service, distributing the documentation, sample code, or source code of the ComPDFKit Conversion SDK to third parties is prohibited. Please ensure you have obtained a valid ComPDFKit License before using this Skill.