Private Document AI with OpenVINO

Parse local PDFs and document images with PaddleOCR-VL or PaddleOCR-VL-1.5 on OpenVINO, then route the structured parse into downstream document-to-data or d...

Audits

Pass

Install

openclaw skills install local-document-ai-openvino

name: local-document-ai-openvino description: Private document AI for Intel hardware. Parse PDFs, invoices, screenshots, and diagrams locally with OpenVINO, then turn them into structured data or executable notebook/code scaffolds with clear quick-start commands and example prompts.

Private Document AI with OpenVINO

Turn local PDFs, invoices, screenshots, and diagrams into one of two useful outcomes:

  1. to-data: classify the document and extract structured fields, tables, and JSON.
  2. to-code: turn screenshots, forms, and architecture diagrams into code or Jupyter notebook scaffolds.

Everything runs locally and is built for Intel CPU/GPU acceleration with OpenVINO.

Why install this skill

Install this when you want one local workflow for:

  • invoice and receipt extraction
  • private PDF understanding
  • table and key-value extraction
  • architecture diagram to notebook generation
  • screenshot to HTML/React scaffold generation

This skill is especially good for demos because it already includes:

  • medical invoice to-data flows
  • restaurant invoice to-data flows
  • architecture diagram to-code -> jupyter-notebook flows
  • local HTML reports for easy review and screenshots

30-second start

Check the environment:

python "{baseDir}/scripts/check_env.py"

Or run directly from the CLI:

python "{baseDir}/scripts/run_skill.py" --mode to-data --file "/absolute/path/to/invoice.pdf" --out "/absolute/path/to/artifacts/invoice_data" --extract "tables,entities,kv_pairs"

Example prompts

Use prompts like these in OpenClaw:

Use $local-document-ai-openvino to parse this local PDF and give me a structured report.
Use $local-document-ai-openvino to extract invoice fields, tables, and key-value pairs from this medical invoice.
Use $local-document-ai-openvino to classify this receipt and return normalized JSON.
Use $local-document-ai-openvino to turn this architecture diagram into a Jupyter notebook scaffold.
Use $local-document-ai-openvino to convert this UI screenshot into an HTML scaffold.

What you get

Typical outputs include:

  • parsed.json
  • parsed.md
  • result_report.html
  • task_output/structured_record.json
  • task_output/normalized.json
  • task_output/notebook.ipynb
  • code_preview.html

Best demo paths

If you are evaluating the skill for the first time, start here:

  1. to-data on an invoice PDF
  2. review result_report.html
  3. inspect structured_record.json
  4. then try to-code with a diagram image and target jupyter-notebook

Core pipeline

Use this skill as a local document-to-action pipeline:

  1. Parse the document into a canonical structured representation.
  2. Optionally continue into to-data or to-code.
  3. Save outputs into a predictable artifact folder with traceability.

Read only if needed

Load these references when you need the schema or output contracts:

  • {baseDir}/references/schema.md
  • {baseDir}/references/mode_guide.md
  • {baseDir}/references/output_contracts.md

Primary entrypoint

Use this published entrypoint:

  • CLI orchestrator: {baseDir}/scripts/run_skill.py

Do not call these implementation scripts directly from the skill:

  • parse_document.py
  • transform_doc_to_data.py
  • transform_doc_to_code.py

Local readiness

Check the environment before processing real documents:

python "{baseDir}/scripts/check_env.py"

Install the base dependencies in a virtual environment:

python -m pip install -r "{baseDir}/requirements.txt"

Install the third-party paddleocr_vl_openvino package only after reviewing the source or wheel and only when you intend to run the real OCR pipeline. Prefer installing from a reviewed local wheel path inside a virtual environment.

Run a quick orchestration smoke test:

python "{baseDir}/scripts/smoke_test.py"

Model assets are discovered from:

  • PADDLEOCR_VL_OPENVINO_MODEL_DIR
  • PADDLEOCR_VL_LAYOUT_MODEL_DIR plus PADDLEOCR_VL_VLM_MODEL_DIR
  • {baseDir}/models/paddleocr-vl-1.5-openvino/
  • {baseDir}/models/paddleocr-vl-openvino/

Allow model auto-download only when the user explicitly approves it.

Supported modes

parse

Use when the user wants the structured parse only.

Outputs:

  • parsed.json
  • parsed.md
  • result_report.html
  • extracted layout, tables, or figures when available

to-data

Use when the user wants structured extraction, normalization, or document classification.

Typical outputs under task_output/:

  • entities.json
  • kv_pairs.json
  • table_index.json
  • normalized.json
  • structured_record.json
  • traceability.json

to-code

Use when the user wants implementation-oriented output from the parse result.

Supported targets:

  • react
  • html-css
  • json-schema
  • jupyter-notebook

Typical outputs under task_output/:

  • component_map.json
  • field_schema.json
  • ui_blueprint.json
  • notes.md
  • traceability.json
  • target-specific artifacts such as app.jsx, index.html, styles.css, schema.json, notebook.ipynb, or notebook_plan.json

Treat all generated code and notebooks as drafts. Review them before running, publishing, or connecting them to real systems.

Published package scope

The published ClawHub bundle is intentionally CLI-first.

  • main workflow: scripts/run_skill.py
  • diagnostics: scripts/check_env.py
  • smoke verification: scripts/smoke_test.py

Developer-only local UI helpers are kept out of the public release bundle.

Pipeline rules

Always follow these rules:

  1. Prefer local execution.
  2. Always parse first into parsed.json.
  3. Generate downstream artifacts from parsed.json, not raw OCR text alone.
  4. Preserve page numbers, reading order, block types, and source anchors when possible.
  5. Write traceability for downstream outputs.
  6. Mark low-confidence regions or assumptions explicitly.
  7. Do not silently drop tables, figures, formulas, charts, or key-value regions.
  8. Save outputs into one artifact folder per run.
  9. For confidential documents, prefer an explicit private --out directory and remove artifacts after review.

Output contract

Default output folder:

./artifacts/<document_stem>/

Expected top-level outputs:

  • effective_config.json
  • run_report.json
  • parsed.json
  • parsed.md
  • result_report.html
  • task_output/

to-code runs may also emit:

  • code_preview.html

CLI examples

Parse

python "{baseDir}/scripts/run_skill.py" \
  --mode parse \
  --file "/absolute/path/to/report.pdf" \
  --out "/absolute/path/to/artifacts/report_parse"

To-data

python "{baseDir}/scripts/run_skill.py" \
  --mode to-data \
  --file "/absolute/path/to/invoice.pdf" \
  --out "/absolute/path/to/artifacts/invoice_data" \
  --extract "tables,entities,kv_pairs"

To-code

python "{baseDir}/scripts/run_skill.py" \
  --mode to-code \
  --file "/absolute/path/to/ui_mockup.png" \
  --out "/absolute/path/to/artifacts/ui_code" \
  --target "react" \
  --title "Generated App"

To-code notebook target

python "{baseDir}/scripts/run_skill.py" \
  --mode to-code \
  --file "/absolute/path/to/architecture_diagram.png" \
  --out "/absolute/path/to/artifacts/notebook_code" \
  --target "jupyter-notebook" \
  --title "OpenVINO Notebook"

Slash-command examples

/skill local-document-ai-openvino parse file=./docs/report.pdf
/skill local-document-ai-openvino to-data file=./docs/invoice.pdf extract=tables,entities,kv_pairs
/skill local-document-ai-openvino to-code file=./mockups/architecture.png target=jupyter-notebook

Optional local demo UI

Start the local UI when the user wants an interactive demo page:

python "{baseDir}/scripts/serve_skill_ui.py"

The UI lets the user:

  • preview a local file
  • choose parse, to-data, or to-code
  • choose the to-code target
  • run the pipeline and inspect the generated local HTML reports

The bundled UI only allows preview/run access for local files under the skill directory and common user content folders such as Downloads, Documents, Desktop, and Pictures.

Failure behavior

If a run fails:

  • state which stage failed
  • do not claim outputs were created if they were not
  • prefer writing error.json with failure details
  • recommend parse first when the downstream request is ambiguous
  • surface stderr or a concise failure summary when available

Safety notes

  • Use a virtual environment for dependency installation.
  • Review and approve model downloads only when you explicitly intend to.
  • Keep outputs in a private local folder when documents are sensitive.
  • Review generated code and notebooks before execution.
  • Delete artifacts when they are no longer needed.
  • The wrapper always uses the bundled local scripts and the current Python interpreter. It does not allow custom interpreter or script-directory overrides.

Short reminder

Present this skill as a local document-understanding workflow with downstream actions, not as a plain OCR wrapper.