Back to skill
v0.0.10

pdf-extract-skill

BenignClawScan verdict for this skill. Analyzed May 1, 2026, 8:24 AM.

Analysis

This skill is coherently focused on local PDF extraction, with reasonable cautions needed around installing the external PDF tool, handling sensitive PDF outputs, and keeping the optional hybrid backend local.

GuidanceBefore installing, verify the OpenDataLoader PDF package source, use a pinned version in an isolated environment, process only intended PDF folders, keep generated outputs private, and run the hybrid backend on localhost only.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Abnormal behavior control

Checks for instructions or behavior that redirect the agent, misuse tools, execute unexpected code, cascade across systems, exploit user trust, or continue outside the intended task.

Agentic Supply Chain Vulnerabilities
SeverityLowConfidenceHighStatusNote
metadata
Source: unknown; Homepage: none; No install spec — this is an instruction-only skill; Required binaries: java, python3, opendataloader-pdf

The skill depends on external local binaries/packages, but the registry metadata does not provide a source or install specification, so users need to verify the package before installing or running it.

User impactInstalling the wrong or unverified package could expose local PDFs or run unwanted code, even though the skill itself provides safety guidance.
RecommendationInstall only a verified, pinned version in a virtual environment or container, and confirm the package homepage/repository and maintainers before use.
Sensitive data protection

Checks for exposed credentials, poisoned memory or context, unclear communication boundaries, or sensitive data that could leave the user's control.

Insecure Inter-Agent Communication
SeverityLowConfidenceHighStatusNote
docs/hybrid-mode-ocr.md
opendataloader-pdf-hybrid --port 5002

Hybrid/OCR mode starts a backend listener on a local port; this is disclosed and purpose-aligned, but sensitive PDFs should only be sent to a backend that is locally bound and trusted.

User impactIf the backend were exposed beyond localhost, other users or systems on the network might be able to interact with the PDF processing service.
RecommendationUse localhost binding when supported, such as `--host 127.0.0.1 --port 5002`, verify active listeners, and stop the backend after processing.
Memory and Context Poisoning
SeverityLowConfidenceHighStatusNote
SKILL.md
RAG and LLM-ready outputs (json + markdown).

The skill is designed to transform PDF contents into reusable RAG/LLM context files, which may preserve sensitive document text and metadata.

User impactSensitive PDF content may be copied into JSON or Markdown outputs and later reused in retrieval or embedding workflows.
RecommendationChoose output folders carefully, use sanitization when appropriate, and avoid adding confidential outputs to shared RAG indexes unless intended.