Pdf To Structured
PassAudited by ClawScan on May 1, 2026.
Overview
The skill appears aligned with PDF data extraction, with expected local file access and setup steps, but users should be careful about optional cloud OCR and third-party package installs.
This skill is reasonable for local PDF extraction. Before installing, use a virtual environment for the suggested Python packages, process only intended PDFs, choose output paths carefully, and avoid cloud OCR unless you are comfortable sending the document contents to that provider.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
The agent may read PDF files and write extracted output files on the local filesystem.
The skill needs local file read/write access to process PDFs and create Excel/CSV/JSON outputs, which is expected but should stay limited to the user's intended files.
Filesystem permission required for reading PDFs and writing output
Use explicit input and output paths, and review generated files before relying on or sharing them.
Installing packages can add external code to the user's environment.
The skill documents installation of third-party Python packages without version pins; this is purpose-aligned for PDF/OCR processing but still a supply-chain consideration.
pip install pdfplumber pandas openpyxl ... pip install pytesseract pdf2image ... pip install pypdf
Install in a virtual environment, verify package names, and pin trusted versions where possible.
Confidential construction documents could be exposed to a third-party OCR service if cloud OCR is used.
A cloud OCR option is disclosed and purpose-aligned, but using it could send document contents to an external provider unless the user explicitly chooses that path.
For scanned PDFs: use OCR (Tesseract or cloud API) first, then parse
Prefer local Tesseract OCR for sensitive files, and require explicit user approval before using any cloud OCR provider.
