Install
openclaw skills install pdf-convert-compdfPDF conversion toolkit featuring AI layout analysis and OCR. Converts PDFs to Word, Markdown, JSON, PPT, CSV, HTML, and XML for seamless LLM data processing.
openclaw skills install pdf-convert-compdfComPDFKitConversion Python SDK into a reusable local conversion workflow, supporting PDF / image to Word, PPT, Excel, HTML, RTF, Image, TXT, JSON, Markdown, and CSV (10 output formats in total).pdf-convert-compdf/.SKILL.md; helper scripts are placed in scripts/.$ARGUMENTS and ${CLAUDE_SKILL_DIR} conventions for distribution and execution in Claude Code / Agent Skills-compatible environments.word/excel/ppt/html/rtf/image/txt/json/markdown/csv), the PDF or image path, and the output path are passed via Skill arguments or the command line. An optional PDF password and conversion parameters may also be provided..pdf).jpg/.jpeg/.png/.bmp/.tif/.tiff/.webp/.jp2/.gif/.tga).docx, .pptx, .xlsx, .html, .rtf, image, .txt, .json, .md, .csv), or a clear error message.pip install ComPDFKitConversion
license.xml from the ComPDF server and caches it in the scripts/ directory:
https://download.compdf.com/skills/license/license.xml
<key>...</key> field from license.xml and uses that key for LibraryManager.license_verify(...) authentication — it does not pass the XML file path directly to the SDK.license.xml in the scripts/ directory; the script will use it directly without downloading.resource directory is always set to the directory containing pdf-convert-compdf.py, i.e., the scripts/ directory itself.--enable-ocr or --enable-ai-layout (enabled by default) is used, the Skill also requires scripts/documentai.model. If the file does not exist, the script will automatically download it from:
https://download.compdf.com/skills/model/documentai.model
export COMPDF_DOCUMENT_AI_MODEL="/path/to/documentai.model"
python -m pip show ComPDFKitConversion
license.xml on first run; the scripts/ directory is used directly as the SDK resource path.python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.pdf output.docx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" ppt input.pdf output.pptx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" excel input.pdf output.xlsx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" excel input.pdf output.xlsx --page-ranges "1-3,5" --excel-all-content --excel-worksheet-option for-page
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.pdf output.docx --enable-ocr --page-layout-mode flow
scripts/license.xml exists (downloading it automatically from the ComPDF server if missing), reads the <key> field for SDK authentication, and uses the scripts/ directory as the resource path.--enable-ocr or --enable-ai-layout (enabled by default) is active, the script checks whether scripts/documentai.model exists; if not, it downloads the file automatically before initializing the Document AI model.SUCCESS, handle license, password, resource, model, or input file issues according to the error name.COMPDF_DOCUMENT_AI_MODEL.scripts/documentai.model.documentai.model.part and then atomically renamed to the final file upon success, preventing partial file corruption.2s / 5s / 10s./pdf-convert-compdf word input.pdf output.docx
/pdf-convert-compdf excel input.pdf output.xlsx --excel-worksheet-option for-page
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" $ARGUMENTS
word → calls CPDFConversion.start_pdf_to_wordexcel → calls CPDFConversion.start_pdf_to_excelppt → calls CPDFConversion.start_pdf_to_ppthtml → calls CPDFConversion.start_pdf_to_htmlrtf → calls CPDFConversion.start_pdf_to_rtfimage → calls CPDFConversion.start_pdf_to_imagetxt → calls CPDFConversion.start_pdf_to_txtjson → calls CPDFConversion.start_pdf_to_jsonmarkdown → calls CPDFConversion.start_pdf_to_markdowncsv → reuses CPDFConversion.start_pdf_to_excel with table/Excel parameters to produce CSV-friendly outputstart_pdf_to_* interfaces natively accept image files with no pre-processing required..pdf → pdf.png/.jpg/.jpeg/.bmp/.tif/.tiff/.gif/.webp/.tga → imagepython "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.png output.docx --source-type image
image -> * and pdf -> * share the same set of CPDFConversion.start_pdf_to_* interfaces; only the input file type differs.The script automatically adjusts certain parameters based on the input source and output format to reduce manual configuration:
| Trigger | Automatic Behavior | User-Overridable | Description |
|---|---|---|---|
Input source is an image (auto-detected or explicit --source-type image) | Automatically enables --enable-ocr | No (--enable-ocr uses store_true; there is no --no-enable-ocr) | Text in images must be extracted via OCR; without OCR, output will contain only images and no text |
Output format is HTML (format = html) | Automatically sets --page-layout-mode to box (box layout) | Yes — passing --page-layout-mode flow explicitly overrides this | Box layout better preserves the original formatting in HTML; specify flow explicitly if flow layout is needed |
When triggered, the script prints a notice to stderr, for example:
Auto-enabled OCR for image input.
Auto-set page layout mode to BOX for HTML output.
| Parameter | Description |
|---|---|
format | Target format: word/excel/ppt/html/rtf/image/txt/json/markdown/csv |
input_pdf | Input file path (PDF or image) |
output_path | Output file path |
| Parameter | Type | Default | Description |
|---|---|---|---|
--source-type | Option | auto | Input source type: auto/pdf/image |
--password | String | "" | PDF open password |
--page-ranges | String | None | Page range, e.g. 1-3,5 |
--font-name | String | "" | Output font name |
| Parameter | Type | Default | Description |
|---|---|---|---|
--enable-ai-layout | Boolean | True | AI layout analysis (disable with --no-enable-ai-layout) |
--page-layout-mode | Option | SDK default flow (auto-switched to box for HTML output) | Page layout: box (box layout) / flow (flow layout) |
| Parameter | Type | Default | Description |
|---|---|---|---|
--contain-image | Boolean | True | Retain images (disable with --no-contain-image) |
--contain-annotation | Boolean | True | Retain annotations (disable with --no-contain-annotation) |
--contain-page-background-image | Boolean | True | Retain page background images (disable with --no-contain-page-background-image) |
--formula-to-image | Boolean | False | Convert formulas to image output |
--transparent-text | Boolean | False | Preserve transparent text |
| Parameter | Type | Default | Description |
|---|---|---|---|
--output-document-per-page | Boolean | False | Split output into one document per page |
--auto-create-folder | Boolean | True | Automatically create output directory (disable with --no-auto-create-folder) |
| Parameter | Type | Default | Description |
|---|---|---|---|
--enable-ocr | Boolean | False (auto-enabled for image input) | Enable OCR |
--ocr-option | Option | SDK default all | OCR scope: invalid-character/scan-page/invalid-character-and-scan-page/all |
--ocr-language | Multi-select | auto | OCR language(s); multiple languages can be specified simultaneously. Options: auto/chinese/chinese-tra/english/korean/japanese/latin/devanagari/cyrillic/arabic/tamil/telugu/kannada/thai/greek/eslav |
| Parameter | Type | Default | Description |
|---|---|---|---|
--excel-all-content | Boolean | False | Include all content in Excel output |
--excel-csv-format | Boolean | False | Output Excel result in CSV format |
--excel-worksheet-option | Option | SDK default for-table | Worksheet split strategy: for-table/for-page/for-document |
| Parameter | Type | Default | Description |
|---|---|---|---|
--json-contain-table | Boolean | True | Include table data in JSON output (disable with --no-json-contain-table) |
| Parameter | Type | Default | Description |
|---|---|---|---|
--txt-table-format | Boolean | True | Enable table formatting in TXT output (disable with --no-txt-table-format) |
| Parameter | Type | Default | Description |
|---|---|---|---|
--html-option | Option | SDK default single-page | HTML output mode: single-page/single-page-with-bookmark/multiple-page/multiple-page-with-bookmark |
| Parameter | Type | Default | Description |
|---|---|---|---|
--image-type | Option | SDK default jpg | Image output format: jpg/jpeg/jpeg2000/png/bmp/tiff/tga/gif/webp |
--image-color-mode | Option | SDK default color | Image color mode: color/gray/binary |
--image-scaling | Float | 1.0 | Image scaling factor |
--image-path-enhance | Boolean | False | Enable image path enhancement |
--enable-ai-layout/--contain-image/--contain-annotation/--contain-page-background-image/--auto-create-folder/--json-contain-table/--txt-table-format) use BooleanOptionalAction; pass --no-xxx to disable.--enable-ocr/--formula-to-image/--transparent-text/--output-document-per-page/--excel-all-content/--excel-csv-format/--image-path-enhance) use store_true; passing the flag enables them.ConvertOptions() defaults — omitting a parameter is equivalent to using the SDK's original default value.python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.pdf output.docx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.pdf output.docx --no-enable-ai-layout --no-contain-image --page-layout-mode box
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.pdf output.docx --output-document-per-page
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" excel input.pdf output.xlsx --excel-all-content --excel-worksheet-option for-page
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" txt input.pdf output.txt
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" html input.pdf output_dir --html-option multiple-page-with-bookmark
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" image input.pdf output.png --image-type png --image-color-mode gray --image-scaling 2.0
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.png output.docx --ocr-language chinese
Note: For image input, the script automatically enables OCR — there is no need to pass
--enable-ocrmanually. To specify an OCR language,--ocr-languagecan still be used.
python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.pdf output.docx --enable-ocr --ocr-language chinese english japanese
scripts/license.xml auto-downloaded from the ComPDF server is a Trial License, allowing a maximum of 200 conversions.stderr, for example:
Trial license: 5/200 conversions used, 195 remaining.
Error: Trial license usage limit reached (200 conversions). Please purchase a license at: https://www.compdf.com/contact-sales
license.xml containing the new <key> in scripts/ (overwriting the auto-downloaded trial file) — no script modifications or counter file cleanup are required.ComPDFKitConversion 3.9.0 has been successfully installed on the local machine.CPDFConversion.start_pdf_to_word/start_pdf_to_ppt/start_pdf_to_excel.LibraryManager provides initialize, license_verify, release, set_document_ai_model, and set_ocr_language.start_pdf_to_* interfaces natively accept image file input (PNG → Word has been verified successfully).enable_ai_layout defaults to True in the SDK; set_document_ai_model() must be called first to load the model before use, otherwise a 0xC0000005 crash will occur.--ocr-language supports specifying multiple languages simultaneously (e.g. --ocr-language chinese english).>=3.6, while the demo page states <3.11, but PyPI currently provides a cp314 wheel in practice; treat the locally installable wheel as the source of truth, but always verify installation in a new environment first.license.xml from the server (network issue) and no manual file exists in scripts/, or the <key> field is empty, the script cannot complete SDK authentication and cannot perform any real conversions.documentai.model is a large file (approximately 525 MB); there will be a noticeable download delay the first time OCR / AI layout is enabled. Because --enable-ai-layout defaults to True, the model download will be triggered on the very first run.https://download.compdf.com/skills/model/documentai.model, place documentai.model in the scripts/ directory in advance.LibraryManager / CPDFConversion API.License.txtscripts/pdf-convert-compdf.pyscripts/license.xml (auto-downloaded from https://download.compdf.com/skills/license/license.xml if missing)<key> field in license.xmlscripts/scripts/documentai.model (auto-downloaded if missing)https://www.compdf.com/contact-saleshttps://www.compdf.com/guides/conversion-sdk/python/overviewhttps://www.compdf.com/guides/conversion-sdk/python/pdf-to-wordhttps://www.compdf.com/guides/conversion-sdk/python/pdf-to-excelhttps://www.compdf.com/guides/conversion-sdk/python/pdf-to-ppthttps://www.compdf.com/guides/conversion-sdk/python/apply-licensepython -m pip show ComPDFKitConversion shows the installed packagepython "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" --help or an equivalent local command produces normal outputscripts/license.xml if missing, then extracts the license key from the <key> field for authenticationscripts/ directory as the SDK resource pathword/excel/ppt/html/rtf/image/txt/json/markdown/csv.png/.jpg/.jpeg/.bmp/.tif/.tiff/.gif/.webp/.tga) as input--enable-ocr or --enable-ai-layout (enabled by default) is active and documentai.model is missing, the script auto-downloads the modellicense.xml cannot be obtained (download fails and no manual file exists) or authentication fails, a clear error is output rather than a silent failure--no-xxx--ocr-language supports specifying multiple languages simultaneously--enable-ocr is not passed, the script automatically enables OCR and prints a notice to stderr--page-layout-mode is not passed, the script automatically uses box (box layout) and prints a notice to stderr--page-layout-mode flow overrides the automatic box layout behaviorpdf-convert-compdf/
├── SKILL.md
├── License.txt
└── scripts/
└── pdf-convert-compdf.py
license.xml is auto-downloaded at runtime; no need to include it in the distribution package.scripts/license.xml is missing and cannot be auto-downloaded (network unavailable or server error): the script will error out before authentication. If you are in an offline environment, place license.xml manually in the scripts/ directory.scripts/license.xml is missing the <key> field or its value is empty: the script will error out before authentication.scripts/ directory: conversion may fail after LibraryManager.initialize().--password: this will trigger PDF_PASSWORD_ERROR.documentai.model is not present locally and the network is unavailable: the model download will fail; place the file in the scripts/ directory manually in advance.--excel-worksheet-option explicitly to avoid unexpected result structures.documentai.model is complete and whether the OCR language matches.https://www.compdf.com/contact-sales.This Skill is built on top of the ComPDFKit Conversion SDK.
© 2014-2026 PDF Technologies, Inc., a KDAN Company. All Rights Reserved.
Important: Under the ComPDFKit Terms of Service, distributing the documentation, sample code, or source code of the ComPDFKit Conversion SDK to third parties is prohibited. Please ensure you have obtained a valid ComPDFKit License before using this Skill.