Install
openclaw skills install paddleocr-doc-parsing-radeonFREE document parsing powered by AMD Radeon Cloud running PaddleOCR-VL 1.5. Extract structured Markdown/JSON from PDFs and document images — tables with cell-level precision, formulas as LaTeX, figures, seals, charts, headers/footers, multi-column layout and correct reading order. No API key required. Trigger terms: 文档解析, 版面分析, 版面还原, 表格提取, 公式识别, 多栏排版, 扫描件结构化, 发票, 财报, 复杂 PDF, PDF转Markdown, 图表, 阅读顺序; reading order, formula, LaTeX, layout parsing, structure extraction, PP-StructureV3, PaddleOCR-VL, AMD Radeon, 免费OCR, free document parsing, Radeon Cloud.
openclaw skills install paddleocr-doc-parsing-radeonFREE PaddleOCR-VL 1.5 document parsing, powered by AMD Radeon Cloud. No API key required.
This skill extracts structured Markdown/JSON from PDFs and document images using PaddleOCR-VL 1.5 running on AMD Radeon Cloud — completely free, with no authentication or token needed.
--output to save to a file when needed. No temporary files are created unless you explicitly choose to save.Trigger keywords (routing): Bilingual trigger terms (Chinese and English) are listed in the YAML description above — use that field for discovery and routing.
Use this skill for:
Do not use for:
Scripts declare their dependencies inline (PEP 723). No separate install step is needed — uv resolves dependencies automatically:
uv run scripts/layout_caller.py --help
Working directory: All
uv run scripts/...commands below should be run from this skill's root directory (the directory containing this SKILL.md file).
Identify the input source:
--file-url parameter--file-path parameterExecute document parsing:
uv run scripts/layout_caller.py --file-url "URL provided by user" --pretty
Or for local files:
uv run scripts/layout_caller.py --file-path "file path" --pretty
Optional: explicitly set file type:
uv run scripts/layout_caller.py --file-url "URL provided by user" --file-type 0 --pretty
--file-type 0: PDF--file-type 1: image.pdf, .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp) is required; otherwise pass --file-type explicitly. For URLs with unrecognized extensions, the service attempts inference.Performance note: Parsing time scales with document complexity. Single-page images typically complete in 1-5 seconds; large PDFs (50+ pages) may take several minutes. Allow adequate time before assuming a timeout.
Default behavior: output JSON to stdout:
--output FILE to save the result to a specific file pathParse JSON response:
ok field: true means success, false means errortext — full document text across all pagesresult.result.layoutParsingResults[n].markdown.text — page-level markdownresult.result.layoutParsingResults[n].prunedResult — structured layout data with positions and confidenceok is false, display error.messagePresent results to user:
Common next steps once you have the structured output:
text field to a .md file — tables, headings, and formulas are preservedresult.result.layoutParsingResults[n].prunedResult to access individual layout elements with position and confidence datatext field is structured markdown, ready for chunking and indexingDisplay the COMPLETE extracted content based on what the user asked for. The parsed output is only useful if the user receives all of it — truncation silently drops data.
text fieldExample - Correct:
User: "Extract all the text from this document"
Agent: I've parsed the complete document. Here's all the extracted text:
[Display entire text field or concatenated regions in reading order]
Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)
Example - Incorrect:
User: "Extract all the text"
Agent: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"
The script returns an envelope with ok, text, result, and error. Use text for the full document content; navigate result.result.layoutParsingResults[n] for per-page structured data.
For the complete schema and field-level details, see references/output_schema.md.
Example 1: Extract Full Document Text (stdout)
uv run scripts/layout_caller.py \
--file-url "https://example.com/paper.pdf" \
--pretty
Then use:
text for quick full-text outputresult.result.layoutParsingResults[n].markdown when page-level output is neededExample 2: Extract Structured Page Data
uv run scripts/layout_caller.py \
--file-path "./financial_report.pdf" \
--pretty
Then use:
result.result.layoutParsingResults[n].prunedResult for structured parsing data (layout/content/confidence)Example 3: Save result to a file
uv run scripts/layout_caller.py \
--file-url "URL" \
--output "./result.json" \
--pretty
By default the script prints JSON to stdout. Use --output to save to a file.
Set PADDLEOCR_DOC_PARSING_API_URL to the AMD Radeon Cloud endpoint URL:
export PADDLEOCR_DOC_PARSING_API_URL="http://134.199.132.159/layout-parsing"
No API key, no token, no sign-up needed. The AMD Radeon Cloud free PaddleOCR-VL 1.5 service requires no authentication.
Optional overrides:
PADDLEOCR_DOC_PARSING_TIMEOUT — Request timeout in seconds (default: 600)For PDFs, the maximum is 100 pages per request.
For large image files, compress before uploading — this reduces upload time and can improve processing stability:
uv run scripts/optimize_file.py input.png output.jpg --quality 85
uv run scripts/layout_caller.py --file-path "output.jpg" --pretty
--quality controls JPEG/WebP lossy compression (1-100, default 85); it has no effect on PNG output. Use --target-size (in MB, default 20) to set the max file size — the script iteratively downscales until the target is met.
For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead:
uv run scripts/layout_caller.py --file-url "https://your-server.com/large_file.pdf"
If you only need certain pages from a large PDF, extract them first:
# Extract pages 1-5
uv run scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5"
# Mixed ranges are supported
uv run scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12"
# Then process the smaller file
uv run scripts/layout_caller.py --file-path "pages_1_5.pdf"
All errors return JSON with ok: false. Show the error message and stop — do not fall back to your own vision capabilities. Identify the issue from error.code and error.message:
API service error (5xx) — error.message contains "API service error"
Rate limit exceeded (429) — error.message contains "API rate limit exceeded"
Unsupported format — error.message contains "Unsupported file format"
No content detected:
text field is emptyIf parsing quality is poor:
optimize_file.py before parsing — oversized inputs can degrade layout detection:
uv run scripts/optimize_file.py input.png optimized.jpg --quality 85
result.result.layoutParsingResults[n].prunedResult includes confidence scores per layout element — low values indicate regions worth reviewingreferences/output_schema.md — Full output schema, field descriptions, and command examplesNote: This skill uses PaddleOCR-VL 1.5 on AMD Radeon Cloud. Model version and capabilities are determined by the AMD Radeon Cloud endpoint.
To verify the skill is working properly:
uv run scripts/smoke_test.py
uv run scripts/smoke_test.py --skip-api-test
uv run scripts/smoke_test.py --test-url "https://..."
The first form tests configuration and API connectivity. --skip-api-test checks configuration only. --test-url overrides the default sample document URL.
This skill is a fork of paddleocr-doc-parsing, modified for AMD Radeon Cloud which provides free PaddleOCR-VL 1.5 document parsing inference. No API key or registration is required.