Pdf To Structured

PassAudited by VirusTotal on May 12, 2026.

Overview

Type: OpenClaw Skill Name: pdf-to-structured Version: 2.0.0 The skill is designed for extracting structured data from PDFs using local libraries like pdfplumber and pytesseract. All code examples and instructions align with this stated purpose. The `claw.json` requests 'filesystem' permission, which is necessary for reading PDFs and writing output files. Crucially, `instructions.md` explicitly states 'No network access required for basic extraction', and no network calls or attempts to exfiltrate data are found in the provided code. There are no signs of prompt injection, obfuscation, or other malicious behaviors.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

The agent may read PDF files and write extracted output files on the local filesystem.

Why it was flagged

The skill needs local file read/write access to process PDFs and create Excel/CSV/JSON outputs, which is expected but should stay limited to the user's intended files.

Skill content
Filesystem permission required for reading PDFs and writing output
Recommendation

Use explicit input and output paths, and review generated files before relying on or sharing them.

What this means

Installing packages can add external code to the user's environment.

Why it was flagged

The skill documents installation of third-party Python packages without version pins; this is purpose-aligned for PDF/OCR processing but still a supply-chain consideration.

Skill content
pip install pdfplumber pandas openpyxl ... pip install pytesseract pdf2image ... pip install pypdf
Recommendation

Install in a virtual environment, verify package names, and pin trusted versions where possible.

What this means

Confidential construction documents could be exposed to a third-party OCR service if cloud OCR is used.

Why it was flagged

A cloud OCR option is disclosed and purpose-aligned, but using it could send document contents to an external provider unless the user explicitly chooses that path.

Skill content
For scanned PDFs: use OCR (Tesseract or cloud API) first, then parse
Recommendation

Prefer local Tesseract OCR for sensitive files, and require explicit user approval before using any cloud OCR provider.