Pdfreader

PassAudited by ClawScan on May 1, 2026.

Overview

Pdfreader appears to be a straightforward local PDF text and metadata extractor, with routine cautions about installing PyMuPDF and handling extracted document contents.

This skill looks appropriate for local PDF text extraction. Before installing, use a trusted PyMuPDF package source, run the script in a controlled directory, pass an explicit page limit for sensitive or large PDFs, and protect or delete generated JSON files that contain document text.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

NoteHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

The skill can create local JSON files containing extracted PDF content in the current workspace.

Why it was flagged

The script reads a user-supplied PDF path and can write extracted results to a user-supplied JSON path. This is the intended local filesystem capability and is bounded by extension/path checks, so it is a notice rather than a concern.

Skill content

pdf_path = sys.argv[1] ... with open(output_file, 'w', encoding='utf-8') as f: json.dump(result, f, ensure_ascii=False, indent=2)

Recommendation

Run it only on intended PDFs in a trusted working directory and review the output path before writing JSON.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Installing an unpinned package may pull whatever current version is available from the configured package index.

Why it was flagged

The skill requires installing an external Python package without a pinned version or lockfile. PyMuPDF is central to the stated PDF extraction purpose, so this is a supply-chain hygiene note rather than suspicious behavior.

Skill content

pip install pymupdf

Recommendation

Install PyMuPDF from a trusted package index and consider pinning a known-good version in controlled environments.

NoteHigh Confidence

ASI06: Memory and Context Poisoning

What this means

Private document text could be placed into a local JSON file and then read into an agent or model context.

Why it was flagged

The generated JSON can contain full PDF text and metadata intended for AI consumption. This is purpose-aligned, but PDF contents may be private or may contain untrusted instructions that should not be treated as authoritative.

Skill content

Outputs JSON for AI reading

Recommendation

Use the skill only with documents you intend the agent to read, and treat extracted PDF text as untrusted content.

NoteHigh Confidence

ASI09: Human-Agent Trust Exploitation

What this means

A user or agent that omits the page limit may process more of a PDF than expected.

Why it was flagged

When max_pages is omitted, the code extracts the entire PDF, which differs from SKILL.md wording that suggests a first-10-pages default. This could surprise users about how much content is extracted.

Skill content

pages_to_extract = max_pages if max_pages else len(doc)

Recommendation

Pass an explicit page count when invoking the script, especially for large or sensitive PDFs.