Pdfreader
PassAudited by ClawScan on May 1, 2026.
Overview
Pdfreader appears to be a straightforward local PDF text and metadata extractor, with routine cautions about installing PyMuPDF and handling extracted document contents.
This skill looks appropriate for local PDF text extraction. Before installing, use a trusted PyMuPDF package source, run the script in a controlled directory, pass an explicit page limit for sensitive or large PDFs, and protect or delete generated JSON files that contain document text.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
The skill can create local JSON files containing extracted PDF content in the current workspace.
The script reads a user-supplied PDF path and can write extracted results to a user-supplied JSON path. This is the intended local filesystem capability and is bounded by extension/path checks, so it is a notice rather than a concern.
pdf_path = sys.argv[1] ... with open(output_file, 'w', encoding='utf-8') as f: json.dump(result, f, ensure_ascii=False, indent=2)
Run it only on intended PDFs in a trusted working directory and review the output path before writing JSON.
Installing an unpinned package may pull whatever current version is available from the configured package index.
The skill requires installing an external Python package without a pinned version or lockfile. PyMuPDF is central to the stated PDF extraction purpose, so this is a supply-chain hygiene note rather than suspicious behavior.
pip install pymupdf
Install PyMuPDF from a trusted package index and consider pinning a known-good version in controlled environments.
Private document text could be placed into a local JSON file and then read into an agent or model context.
The generated JSON can contain full PDF text and metadata intended for AI consumption. This is purpose-aligned, but PDF contents may be private or may contain untrusted instructions that should not be treated as authoritative.
Outputs JSON for AI reading
Use the skill only with documents you intend the agent to read, and treat extracted PDF text as untrusted content.
A user or agent that omits the page limit may process more of a PDF than expected.
When max_pages is omitted, the code extracts the entire PDF, which differs from SKILL.md wording that suggests a first-10-pages default. This could surprise users about how much content is extracted.
pages_to_extract = max_pages if max_pages else len(doc)
Pass an explicit page count when invoking the script, especially for large or sensitive PDFs.
