pdf

PassAudited by ClawScan on Apr 15, 2026.

Overview

The skill's files and runtime instructions are consistent with a PDF processing/fill toolkit; nothing in the bundle requests unrelated credentials or network access, though there are a few minor documentation gaps about system-level dependencies and one deliberate library monkeypatch to be aware of.

What to check before installing/using this skill: - System dependencies: Several examples and scripts rely on system binaries that are not declared in metadata: poppler (pdftoppm/pdftotext/pdfimages) for pdf2image/poppler-utils and tesseract for pytesseract OCR. Install those on the host if you plan to use OCR/image conversion. Optional tools referenced: qpdf and pdftk. - Python packages: The code expects common libraries (pypdf, pdfplumber, reportlab, pdf2image, pytesseract, pillow, pypdfium2, pandas). Install these in a virtualenv before running. - Data sensitivity: These scripts read and write PDFs and annotation data. Only run them on PDFs you trust or in a sandbox if the PDFs may contain sensitive personal data. The forms workflow is manual and requires you to produce fields.json and validate images — follow the forms.md validation steps carefully to avoid writing data into the wrong locations. - Manual validation required: The forms.md explicitly requires visual inspection of validation images and bounding-box checks; do not skip that. The check_bounding_boxes.py script enforces non-overlap and minimum heights but cannot replace human review. - Monkeypatch: fill_fillable_fields.py monkeypatches a pypdf DictionaryObject method to work around a library bug. This is a local change to library behavior during the script run — it's done for a specific compatibility reason but review that code and be aware it modifies library internals at runtime. - Safety practice: Run the code in an isolated environment (virtualenv/container) first, review scripts if you plan to process untrusted PDFs, and do not expect the skill to exfiltrate data — there are no network endpoints or secret requests in the bundle. If you need networked or automated bulk processing, consider additional safeguards (rate limits, auditing, and scanning outputs for PII) before using with sensitive documents. - Licensing: LICENSE.txt is restrictive (Anthropic) and prohibits extracting or redistributing the skill materials outside the service. Ensure your intended use complies with that license.