Agent Paddleocr Vision

v1.1.0

Multi-language document understanding with PaddleOCR

0· 281· 3 versions· 2 current· 2 all-time· Updated 19h ago· MIT-0
byAllen Niu@nhzallen

Install

openclaw skills install agent-paddleocr-vision

Agent PaddleOCR Vision

OCR with Agent Actions — powered by PaddleOCR only. Automatically classifies documents and provides actionable prompts.

What It Does

  • OCR extraction via PaddleOCR cloud API (requires credentials)
  • 11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general
  • Action suggestion with structured parameters
  • Batch processing
  • Searchable PDF generation (with bbox alignment)

Quick Start

# Install dependencies
pip3 install -r scripts/requirements.txt

# Configure PaddleOCR API
export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing
export PADDLEOCR_ACCESS_TOKEN=your_token

# Process a file
python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf

Batch

python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out

Output

See docs/README.zh.md for full JSON schema and integration guide.

Supported Types

TypeActions
Invoicecreate_expense, archive, tax_report
Business Cardadd_contact, save_vcard
Receiptcreate_expense, split_bill
Tableexport_csv, analyze_data
Contractsummarize, extract_dates, flag_obligations
ID Cardextract_id_info, verify_age
Passportstore_passport_info, check_validity
Bank Statementcategorize_transactions, generate_report
Driver Licensestore_license_info, check_expiry
Tax Formsummarize_tax, suggest_deductions
Generalsummarize, translate, search_keywords

Configuration

Required environment variables:

  • PADDLEOCR_DOC_PARSING_API_URL — API endpoint ending in /layout-parsing
  • PADDLEOCR_ACCESS_TOKEN — Access token

Optional:

  • PADDLEOCR_DOC_PARSING_TIMEOUT — Default 600 seconds

Searchable PDF

With --make-searchable-pdf, embeds OCR text layer aligned to original layout using bounding boxes. Requires pdf2image + poppler (system) and reportlab, pypdf, pillow (Python).

Full Documentation

Detailed usage, troubleshooting, and development guide available in multiple languages under docs/:

  • 中文: docs/README.zh.md
  • English: docs/README.en.md
  • Español: docs/README.es.md
  • العربية: docs/README.ar.md

License

MIT-0


Made for OpenClaw. Let your agent see and act.

Version tags

agent-actionsvk979g7wk52nc8hkkcf1vgjxrc982ygwsbatchvk979g7wk52nc8hkkcf1vgjxrc982ygwsdocument-understandingvk979g7wk52nc8hkkcf1vgjxrc982ygwsinvoicevk979g7wk52nc8hkkcf1vgjxrc982ygwslatestvk979g7wk52nc8hkkcf1vgjxrc982ygwsocrvk979g7wk52nc8hkkcf1vgjxrc982ygwspaddleocrvk979g7wk52nc8hkkcf1vgjxrc982ygwssearchable-pdfvk979g7wk52nc8hkkcf1vgjxrc982ygws

Runtime requirements

👁️ Clawdis
Binspython
EnvPADDLEOCR_DOC_PARSING_API_URL, PADDLEOCR_ACCESS_TOKEN
Primary envPADDLEOCR_ACCESS_TOKEN