Private Document AI with OpenVINO

PassAudited by ClawScan on May 11, 2026.

Overview

This appears to be a local document-processing skill; the main cautions are reviewing dependencies, protecting extracted artifacts, and checking generated code before running it.

Install and run this in a virtual environment, review any optional PaddleOCR-VL/OpenVINO wheel or model download before enabling it, process only documents you intend to parse, store artifacts in a protected local folder, and review generated code or notebooks before running them. Some source in the supplied review was truncated, so inspect the full package if you require high assurance.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If pointed at the wrong file or output folder, private document contents could be parsed and stored somewhere unintended.

Why it was flagged

The main workflow intentionally accepts local input and output paths. This is expected for document OCR, but it gives the agent broad file/path handling when the user supplies paths.

Skill content
python "{baseDir}/scripts/run_skill.py" --mode to-data --file "/absolute/path/to/invoice.pdf" --out "/absolute/path/to/artifacts/invoice_data"
Recommendation

Use explicit file paths, choose a dedicated local output folder, and avoid cloud-synced or shared folders for confidential documents.

What this means

Installing unreviewed packages or model tooling could run third-party code in the same environment used for private documents.

Why it was flagged

The skill relies on external Python packages and an optional third-party OCR package. The artifacts disclose this and recommend review, but dependency provenance still matters.

Skill content
PyMuPDF>=1.24.0 ... openvino>=2026.0.0 ... The third-party paddleocr_vl_openvino package is intentionally NOT installed ... Review the source or wheel first
Recommendation

Install in a virtual environment, prefer reviewed or pinned packages, and only enable model downloads or install OCR wheels from trusted sources.

What this means

Generated scaffolds may be incomplete or unsafe if run, deployed, or connected to real systems without review.

Why it was flagged

The skill can generate executable code or notebooks, but the artifacts explicitly frame them as drafts and do not show automatic execution of generated code.

Skill content
Typical outputs ... `task_output/notebook.ipynb` ... `app.jsx`, `index.html`, `styles.css` ... Treat all generated code and notebooks as drafts. Review them before running
Recommendation

Inspect generated code and notebooks before execution, publishing, or connecting them to real data or services.

What this means

Sharing artifact folders may reveal private document contents, filenames, local paths, and document fingerprints.

Why it was flagged

The generated parse output records source metadata, including the resolved local input path and file hash, alongside parsed document content.

Skill content
"source": { "input_path": str(config.file), "input_type": input_type, "filename": config.file.name, "sha256": file_hash }
Recommendation

Treat output folders as sensitive, redact artifacts before sharing, and delete outputs when they are no longer needed.