Pdfreader

PassAudited by ClawScan on May 1, 2026.

Overview

Pdfreader appears to be a straightforward local PDF text and metadata extractor, with routine cautions about installing PyMuPDF and handling extracted document contents.

This skill looks appropriate for local PDF text extraction. Before installing, use a trusted PyMuPDF package source, run the script in a controlled directory, pass an explicit page limit for sensitive or large PDFs, and protect or delete generated JSON files that contain document text.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

The skill can create local JSON files containing extracted PDF content in the current workspace.

Why it was flagged

The script reads a user-supplied PDF path and can write extracted results to a user-supplied JSON path. This is the intended local filesystem capability and is bounded by extension/path checks, so it is a notice rather than a concern.

Skill content
pdf_path = sys.argv[1] ... with open(output_file, 'w', encoding='utf-8') as f: json.dump(result, f, ensure_ascii=False, indent=2)
Recommendation

Run it only on intended PDFs in a trusted working directory and review the output path before writing JSON.

What this means

Installing an unpinned package may pull whatever current version is available from the configured package index.

Why it was flagged

The skill requires installing an external Python package without a pinned version or lockfile. PyMuPDF is central to the stated PDF extraction purpose, so this is a supply-chain hygiene note rather than suspicious behavior.

Skill content
pip install pymupdf
Recommendation

Install PyMuPDF from a trusted package index and consider pinning a known-good version in controlled environments.

What this means

Private document text could be placed into a local JSON file and then read into an agent or model context.

Why it was flagged

The generated JSON can contain full PDF text and metadata intended for AI consumption. This is purpose-aligned, but PDF contents may be private or may contain untrusted instructions that should not be treated as authoritative.

Skill content
Outputs JSON for AI reading
Recommendation

Use the skill only with documents you intend the agent to read, and treat extracted PDF text as untrusted content.

What this means

A user or agent that omits the page limit may process more of a PDF than expected.

Why it was flagged

When max_pages is omitted, the code extracts the entire PDF, which differs from SKILL.md wording that suggests a first-10-pages default. This could surprise users about how much content is extracted.

Skill content
pages_to_extract = max_pages if max_pages else len(doc)
Recommendation

Pass an explicit page count when invoking the script, especially for large or sensitive PDFs.