PDFExtract Pull Text from PDFs
PassAudited by ClawScan on May 1, 2026.
Overview
This appears to be a local PDF text extractor with no external services, but users should notice that it reads document contents into the agent and may use local parser tools.
This skill looks coherent for local PDF extraction and does not show external data transmission. Before installing, be aware that any PDF you give it can be read into the agent context, and that it may rely on local PDF parsing tools or an optional npm package if available.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
PDF contents and metadata may become visible to the agent and could affect downstream reasoning if the document text is untrusted.
The skill is designed to put PDF contents into agent-readable context. This is expected, but sensitive PDFs or PDFs containing prompt-like instructions could influence the agent if treated as trusted instructions instead of document text.
Extract clean readable text from PDF files into agent-ready markdown.
Only process PDFs you intend the agent to read, and treat extracted document text as untrusted content rather than instructions.
When pdf-parse is unavailable, the skill may invoke a local PDF conversion program with the selected file path.
The code may run the local pdftotext binary as a fallback. This is aligned with PDF extraction and uses an argument array plus timeout, but it is still local command execution.
const text = _cp['execFileSync']('pdftotext', [pdfPath, '-'], { encoding: 'utf8', timeout: 30000 });Ensure any local pdftotext binary is from a trusted source and avoid processing suspicious PDFs in sensitive environments.
If you install the optional package, its version and source determine part of the skill's behavior.
The source references an optional npm dependency, but no package manifest or pinned version is included. This is a normal optional parser dependency, but provenance and version pinning are left to the user.
For complex PDFs, install pdf-parse: npm install pdf-parse
Install optional dependencies from trusted registries and consider pinning known-good versions.
