Back to skill
v1.0.3

pdf-ocr-extraction

BenignClawScan verdict for this skill. Analyzed May 1, 2026, 7:42 AM.

Analysis

This skill is a straightforward local OCR helper, with visible dependency installation and temporary file handling that users should understand before use.

GuidanceThis appears safe for its stated purpose. Before installing, make sure the OCR dependencies come from trusted sources, and process sensitive PDFs in a trusted local environment because pages are temporarily rendered to image files during extraction.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Abnormal behavior control

Checks for instructions or behavior that redirect the agent, misuse tools, execute unexpected code, cascade across systems, exploit user trust, or continue outside the intended task.

Agentic Supply Chain Vulnerabilities
SeverityLowConfidenceHighStatusNote
install spec
uv | package: pypdfium2 pytesseract Pillow

The skill depends on external Python packages without pinned versions. These packages are purpose-aligned for PDF rendering and OCR, but users should rely on trusted package sources.

User impactA dependency update or package-source issue could affect what code is installed for the OCR workflow.
RecommendationInstall dependencies from trusted indexes and consider pinning known-good versions in managed environments.
Unexpected Code Execution
SeverityInfoConfidenceHighStatusNote
SKILL.md
Create a Python script (e.g., `extract.py`) ... Then execute the script: `python3 extract.py /path/to/document.pdf`

The skill's workflow involves creating and running a local Python script. This is clearly disclosed and central to the OCR purpose.

User impactUsing the skill will run local Python code to process the selected PDF.
RecommendationReview the generated script before running it, especially if modifying the example or processing sensitive files.
Tool Misuse and Exploitation
SeverityLowConfidenceMediumStatusNote
SKILL.md
tmp_img = f"/tmp/page_{i}.png" ... os.remove(tmp_img)

The example writes rendered PDF pages to predictable temporary image paths and then deletes them. This is purpose-aligned but creates short-lived copies of document pages.

User impactSensitive PDF pages may briefly exist as image files in /tmp while OCR is running.
RecommendationFor sensitive documents, run in a trusted environment and prefer unique temporary directories or secure temporary-file APIs.