Install
openclaw skills install paddleocr-vl-locallyComplex document parsing with PaddleOCR. Intelligently converts complex PDFs and document images into Markdown and JSON files that preserve the original structure.
openclaw skills install paddleocr-vl-locallyUse Document Parsing for:
Use Text Recognition instead for:
Install Python dependencies before using this skill. From the skill directory (skills/paddleocr-doc-parsing):
pip install -r scripts/requirements.txt
Optional — for document optimization and split_pdf.py (page extraction):
pip install -r scripts/requirements-optimize.txt
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
python scripts/vl_caller.pyIf the script execution fails (API not configured, network error, etc.):
Execute document parsing:
python scripts/vl_caller.py --file-url "URL provided by user" --pretty
Or for local files:
python scripts/vl_caller.py --file-path "file path" --pretty
Optional: explicitly set file type:
python scripts/vl_caller.py --file-url "URL provided by user" --file-type 0 --pretty
--file-type 0: PDF--file-type 1: imageDefault behavior: save raw JSON to a temp file:
--output is omitted, the script saves automatically under the system temp directory<system-temp>/paddleocr/doc-parsing/results/result_<timestamp>_<id>.json--output is provided, it overrides the default temp-file destination--stdout is provided, JSON is printed to stdout and no file is savedResult saved to: /absolute/path/...--stdout only when you explicitly want to skip file persistenceThe output JSON contains COMPLETE content with all document data:
Input type note:
Extract what the user needs from the output JSON using these fields:
textresult[n].markdownresult[n].prunedResultCRITICAL: You must display the COMPLETE extracted content to the user based on their needs.
text fieldWhat this means:
text, result[n].markdown, and result[n].prunedResultExample - Correct:
User: "Extract all the text from this document"
Agent: I've parsed the complete document. Here's all the extracted text:
[Display entire text field or concatenated regions in reading order]
Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)
Example - Incorrect:
User: "Extract all the text"
Agent: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"
The output JSON uses an envelope wrapping the raw API result:
{
"ok": true,
"text": "Full markdown/HTML text extracted from all pages",
"result": { ... }, // raw provider response
"error": null
}
Key fields:
text — extracted markdown text from all pages (use this for quick text display)result - raw provider response objectresult[n].prunedResult - structured parsing output for each page (layout/content/confidence and related metadata)result[n].markdown — full rendered page output in markdown/HTMLRaw result location (default): the temp-file path printed by the script on stderr
Example 1: Extract Full Document Text
python scripts/vl_caller.py \
--file-url "https://example.com/paper.pdf" \
--pretty
Then use:
text for quick full-text outputresult[n].markdown when page-level output is neededExample 2: Extract Structured Page Data
python scripts/vl_caller.py \
--file-path "./financial_report.pdf" \
--pretty
Then use:
result[n].prunedResult for structured parsing data (layout/content/confidence)result[n].markdown for rendered page contentExample 3: Print JSON Without Saving
python scripts/vl_caller.py \
--file-url "URL" \
--stdout \
--pretty
Then return:
text when user asks for full document contentresult[n].prunedResult and result[n].markdown when user needs complete structured page dataWhen API is not configured:
The error will show:
CONFIG_ERROR: PADDLEOCR_DOC_PARSING_API_URL not configured. Set it to your Triton endpoint, e.g.: http://10.0.0.1:8020/v2/models/layout-parsing/infer
Configuration workflow:
Show the exact error message to the user.
Guide the user to configure:
PADDLEOCR_DOC_PARSING_API_URL to the full Triton inference endpoint URL.
Format: http://<host>:<port>/v2/models/layout-parsing/infer
Example: http://10.0.133.33:8020/v2/models/layout-parsing/inferPADDLEOCR_BASIC_AUTH_USER — nginx username (e.g. ocr_admin)PADDLEOCR_BASIC_AUTH_PASSWORD — nginx passwordPADDLEOCR_ACCESS_TOKEN is not required for local deployments. Leave it empty or omit it.PADDLEOCR_DOC_PARSING_TIMEOUT (default: 600 seconds).~/.openclaw/openclaw.json:
{
"skills": {
"entries": {
"paddleocr-doc-parsing": {
"enabled": true,
"env": {
"PADDLEOCR_DOC_PARSING_API_URL": "http://10.0.133.33:8020/v2/models/layout-parsing/infer",
"PADDLEOCR_BASIC_AUTH_USER": "ocr_admin",
"PADDLEOCR_BASIC_AUTH_PASSWORD": "your_password"
}
}
}
}
}
Ask the user to confirm the environment is configured.
Retry only after confirmation:
There is no file size limit for the API. For PDFs, the maximum is 100 pages per request.
Tips for large files:
For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead:
python scripts/vl_caller.py --file-url "https://your-server.com/large_file.pdf"
If you only need certain pages from a large PDF, extract them first:
# Extract pages 1-5
python scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5"
# Mixed ranges are supported
python scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12"
# Then process the smaller file
python scripts/vl_caller.py --file-path "pages_1_5.pdf"
Service unreachable:
error: API request failed: ...
→ Check that the Triton service is running and PADDLEOCR_DOC_PARSING_API_URL is correct
Request timeout:
error: API request timed out after 600s
→ Increase PADDLEOCR_DOC_PARSING_TIMEOUT or check server load
Unsupported format:
error: Unsupported file format
→ File format not supported, convert to PDF/PNG/JPG
references/output_schema.md - Output format specificationNote: Model version and capabilities are determined by your Triton deployment (
PADDLEOCR_DOC_PARSING_API_URL).
Load these reference documents into context when:
To verify the skill is working properly:
python scripts/smoke_test.py
This tests configuration and optionally API connectivity.