Install
openclaw skills install agent-paddleocr-visionMulti-language document understanding with PaddleOCR
openclaw skills install agent-paddleocr-visionOCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.
# Install dependencies
pip3 install -r scripts/requirements.txt
# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token
# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf
python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out
See docs/README.zh.md for full JSON schema and integration guide.
| Type | Actions |
|---|---|
| Invoice | create_expense, archive, tax_report |
| Business Card | add_contact, save_vcard |
| Receipt | create_expense, split_bill |
| Table | export_csv, analyze_data |
| Contract | summarize, extract_dates, flag_obligations |
| ID Card | extract_id_info, verify_age |
| Passport | store_passport_info, check_validity |
| Bank Statement | categorize_transactions, generate_report |
| Driver License | store_license_info, check_expiry |
| Tax Form | summarize_tax, suggest_deductions |
| General | summarize, translate, search_keywords |
Required environment variables:
PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsingPADDLEOCR_ACCESS_TOKEN — Access tokenOptional:
PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 secondsWith --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).
Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:
docs/README.zh.mddocs/README.en.mddocs/README.es.mddocs/README.ar.mdMIT-0
Made for OpenClaw. Let your agent see and act.