PDF Text Extractor
PassAudited by ClawScan on May 1, 2026.
Overview
The skill does not show exfiltration, persistence, or destructive behavior; it mainly reads user-selected PDFs, but its dependency and OCR claims are inconsistent and extracted document text should be treated as sensitive.
This skill appears safe to use for its stated purpose if you intentionally choose the PDFs. Before installing, note that it is not truly dependency-free, OCR support appears overstated, and any extracted document text may be visible to the agent and should not be treated as instructions.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If the agent or user selects the wrong PDF, private document text and metadata could be exposed in the conversation.
The skill reads the file path supplied by the caller. This is expected for a PDF extractor, but it means any selected PDF's contents can be brought into the agent output.
const fileData = fs.readFileSync(pdfPath);
Use explicit, intended PDF paths and review batch inputs before extraction, especially for invoices, contracts, or other confidential documents.
A user expecting a dependency-free skill may need to install third-party npm code for the extractor to work.
The package declares an npm dependency even though the skill description says zero dependencies and the registry lists no install spec. The dependency is aligned with PDF parsing, but the dependency footprint is not consistently disclosed.
"dependencies": { "pdfjs-dist": "^3.11.174" }Install dependencies only from trusted sources, prefer the included lockfile or pinned versions, and treat the zero-dependency claim as inaccurate.
Private document contents may become part of the model context, and text inside PDFs should not be treated as trusted instructions.
The documented workflow explicitly sends extracted document text toward LLM analysis. PDF text is untrusted user/document content and may contain sensitive information or prompt-like instructions.
- Prepare content for LLM processing
Only process documents you are comfortable sharing with the agent, and instruct the agent to treat extracted PDF text as data rather than commands.
