PDFExtract Pull Text from PDFs

PassAudited by ClawScan on May 1, 2026.

Overview

This appears to be a local PDF text extractor with no external services, but users should notice that it reads document contents into the agent and may use local parser tools.

This skill looks coherent for local PDF extraction and does not show external data transmission. Before installing, be aware that any PDF you give it can be read into the agent context, and that it may rely on local PDF parsing tools or an optional npm package if available.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI06: Memory and Context Poisoning

What this means

PDF contents and metadata may become visible to the agent and could affect downstream reasoning if the document text is untrusted.

Why it was flagged

The skill is designed to put PDF contents into agent-readable context. This is expected, but sensitive PDFs or PDFs containing prompt-like instructions could influence the agent if treated as trusted instructions instead of document text.

Skill content

Extract clean readable text from PDF files into agent-ready markdown.

Recommendation

Only process PDFs you intend the agent to read, and treat extracted document text as untrusted content rather than instructions.

Low

#ASI05: Unexpected Code Execution

What this means

When pdf-parse is unavailable, the skill may invoke a local PDF conversion program with the selected file path.

Why it was flagged

The code may run the local pdftotext binary as a fallback. This is aligned with PDF extraction and uses an argument array plus timeout, but it is still local command execution.

Skill content

const text = _cp['execFileSync']('pdftotext', [pdfPath, '-'], { encoding: 'utf8', timeout: 30000 });

Recommendation

Ensure any local pdftotext binary is from a trusted source and avoid processing suspicious PDFs in sensitive environments.

Info

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

If you install the optional package, its version and source determine part of the skill's behavior.

Why it was flagged

The source references an optional npm dependency, but no package manifest or pinned version is included. This is a normal optional parser dependency, but provenance and version pinning are left to the user.

Skill content

For complex PDFs, install pdf-parse: npm install pdf-parse

Recommendation

Install optional dependencies from trusted registries and consider pinning known-good versions.