Install
openclaw skills install doclingExtract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling CLI with GPU acceleration. Use INSTEAD of web_fetch for extracting content from specific URLs when you need clean, structured text. Use Brave (web_search) for searching/discovering pages. Use docling when you HAVE a URL and need its content parsed.
openclaw skills install doclingCLI tool for parsing documents and web pages into clean, structured text. Uses GPU acceleration for OCR and ML models.
docling CLI must be installed (e.g., via pipx install docling)docling "<URL>" --from html --to md
Output: creates a .md file in current directory (or use --output)
docling "<URL>" --from html --to text --output /tmp/docling_out
docling "/path/to/file.pdf" --ocr --device cuda --output /tmp/docling_out
| Option | Values | Description |
|---|---|---|
--from | html, pdf, docx, pptx, image, md, csv, xlsx | Input format |
--to | md, text, json, yaml, html | Output format |
--device | auto, cuda, cpu | Accelerator (default: auto) |
--output | path | Output directory (recommended: use controlled temp dir) |
--ocr | flag | Enable OCR for images/scanned PDFs |
--tables | flag | Extract tables (default: on) |
⚠️ Avoid these flags unless you trust the source:
--enable-remote-services - can send data to remote endpoints--allow-external-plugins - loads third-party code--headers with untrusted values - can redirect requestsdocling "<URL>" --from html --to text --output /tmp/docling_outDocling supports GPU acceleration via CUDA (NVIDIA). Verify CUDA is available:
python -c "import torch; print(torch.cuda.is_available())"
See references/cli-reference.md for complete option list.