Pdf To Structured

v2.0.0

Extract structured data from construction PDFs. Convert specifications, BOMs, schedules, and reports from PDF to Excel/CSV/JSON. Use OCR for scanned documents and pdfplumber for native PDFs.

8· 3.5k·31 current·33 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description align with the instructions: extracting tables/text from construction PDFs using pdfplumber, pandas, and Tesseract OCR. The declared permission (filesystem) is appropriate for reading PDFs and writing outputs. Minor note: the human-facing instructions mention 'cloud API' as an OCR option in places, but the skill metadata does not request network or cloud credentials — this is a small inconsistency to be aware of (it just means the skill prefers local OCR but suggests cloud as an alternative).
Instruction Scope
SKILL.md and instructions.md focus on loading PDFs, extracting tables/text, running OCR on scanned documents, cleaning data, and writing Excel/CSV/JSON outputs. The instructions do not direct the agent to read unrelated files or environment variables, nor to transmit extracted data to external endpoints. They do instruct the user to install and run local tools (Python packages and Tesseract) and to read/write filesystem paths — which is expected for this task.
Install Mechanism
This is an instruction-only skill with no install spec. SKILL.md recommends pip installs (pdfplumber, pandas, pytesseract, pdf2image, pypdf, opencv) and installing the Tesseract binary. That is typical for Python-based extraction but carries the normal supply-chain caveat for pip packages and external binaries (user-installed). No automatic downloads or obscure URLs are included by the skill itself.
Credentials
The skill requests no environment variables or credentials and only requires filesystem access to read/write PDFs and outputs. The earlier mention that cloud OCR is an option implies that if a user chooses a cloud OCR path they may need to supply API keys, but the skill does not require or request those by default.
Persistence & Privilege
The skill is not always-enabled and does not request elevated or long-lived privileges beyond filesystem access. It does not indicate modifying other skills or system-wide settings. Autonomous invocation is permitted by platform default but is not combined with any extra privileges here.
Assessment
This skill appears coherent and focused on converting PDFs to Excel/CSV/JSON using local Python libraries and Tesseract OCR. Before installing or running it: (1) confirm you are comfortable granting the agent filesystem access to read the PDFs and write outputs; (2) install recommended dependencies yourself (pip packages, and the Tesseract binary) from official sources; (3) if you plan to use a cloud OCR option, be aware that will require network access and API keys — the skill does not declare or manage those keys; (4) avoid processing highly sensitive documents unless you trust the runtime environment, and inspect outputs (and intermediate text) for correctness. Overall the skill is internally consistent with its stated purpose.

Like a lobster shell, security has layers — review code before you run it.

latestvk97afp8dz4ekzc5s2qh2qgz3r98127h0

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments