Back to skill
v1.0.0

PDF Text Extractor

BenignClawScan verdict for this skill. Analyzed May 1, 2026, 5:21 AM.

Analysis

The skill does not show exfiltration, persistence, or destructive behavior; it mainly reads user-selected PDFs, but its dependency and OCR claims are inconsistent and extracted document text should be treated as sensitive.

GuidanceThis skill appears safe to use for its stated purpose if you intentionally choose the PDFs. Before installing, note that it is not truly dependency-free, OCR support appears overstated, and any extracted document text may be visible to the agent and should not be treated as instructions.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Abnormal behavior control

Checks for instructions or behavior that redirect the agent, misuse tools, execute unexpected code, cascade across systems, exploit user trust, or continue outside the intended task.

Tool Misuse and Exploitation
SeverityLowConfidenceHighStatusNote
index.js
const fileData = fs.readFileSync(pdfPath);

The skill reads the file path supplied by the caller. This is expected for a PDF extractor, but it means any selected PDF's contents can be brought into the agent output.

User impactIf the agent or user selects the wrong PDF, private document text and metadata could be exposed in the conversation.
RecommendationUse explicit, intended PDF paths and review batch inputs before extraction, especially for invoices, contracts, or other confidential documents.
Agentic Supply Chain Vulnerabilities
SeverityLowConfidenceHighStatusNote
package.json
"dependencies": { "pdfjs-dist": "^3.11.174" }

The package declares an npm dependency even though the skill description says zero dependencies and the registry lists no install spec. The dependency is aligned with PDF parsing, but the dependency footprint is not consistently disclosed.

User impactA user expecting a dependency-free skill may need to install third-party npm code for the extractor to work.
RecommendationInstall dependencies only from trusted sources, prefer the included lockfile or pinned versions, and treat the zero-dependency claim as inaccurate.
Sensitive data protection

Checks for exposed credentials, poisoned memory or context, unclear communication boundaries, or sensitive data that could leave the user's control.

Memory and Context Poisoning
SeverityLowConfidenceHighStatusNote
README.md
- Prepare content for LLM processing

The documented workflow explicitly sends extracted document text toward LLM analysis. PDF text is untrusted user/document content and may contain sensitive information or prompt-like instructions.

User impactPrivate document contents may become part of the model context, and text inside PDFs should not be treated as trusted instructions.
RecommendationOnly process documents you are comfortable sharing with the agent, and instruct the agent to treat extracted PDF text as data rather than commands.