Pdf To Structured

PassAudited by ClawScan on May 1, 2026.

Overview

The skill appears aligned with PDF data extraction, with expected local file access and setup steps, but users should be careful about optional cloud OCR and third-party package installs.

This skill is reasonable for local PDF extraction. Before installing, use a virtual environment for the suggested Python packages, process only intended PDFs, choose output paths carefully, and avoid cloud OCR unless you are comfortable sending the document contents to that provider.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

NoteHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

The agent may read PDF files and write extracted output files on the local filesystem.

Why it was flagged

The skill needs local file read/write access to process PDFs and create Excel/CSV/JSON outputs, which is expected but should stay limited to the user's intended files.

Skill content

Filesystem permission required for reading PDFs and writing output

Recommendation

Use explicit input and output paths, and review generated files before relying on or sharing them.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Installing packages can add external code to the user's environment.

Why it was flagged

The skill documents installation of third-party Python packages without version pins; this is purpose-aligned for PDF/OCR processing but still a supply-chain consideration.

Skill content

pip install pdfplumber pandas openpyxl ... pip install pytesseract pdf2image ... pip install pypdf

Recommendation

Install in a virtual environment, verify package names, and pin trusted versions where possible.

NoteMedium Confidence

ASI07: Insecure Inter-Agent Communication

What this means

Confidential construction documents could be exposed to a third-party OCR service if cloud OCR is used.

Why it was flagged

A cloud OCR option is disclosed and purpose-aligned, but using it could send document contents to an external provider unless the user explicitly chooses that path.

Skill content

For scanned PDFs: use OCR (Tesseract or cloud API) first, then parse

Recommendation

Prefer local Tesseract OCR for sensitive files, and require explicit user approval before using any cloud OCR provider.