Pdf Ocr Tool

Security checks across malware telemetry and agentic risk

Overview

This skill appears to do the advertised PDF/image OCR work, but its installer and OCR data handling need review before use with sensitive documents.

Review before installing. Keep Ollama on localhost for confidential documents, avoid remote HTTP Ollama endpoints unless you fully trust the server and network path, avoid copy-pasting curl | sh prerequisite installers without verification, and treat any generated or temporary images as containing the original document content.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Tool MisuseTool Parameter Abuse, Chaining Abuse, Unsafe Defaults
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Findings (19)

Lp3

Medium
Category
MCP Least Privilege
Confidence
88% confidence
Finding
The skill demonstrably uses file I/O, shell commands, and network access, but the metadata does not declare those capabilities. This undermines least-privilege review and can cause users or platforms to approve execution without understanding that installation and OCR processing interact with local services and system tools.

Tp4

High
Category
MCP Tool Poisoning
Confidence
84% confidence
Finding
The advertised purpose is document conversion, but the documented behavior also includes environment bootstrapping, dependency installation, external downloads, and interaction with local services. That mismatch increases the chance of unsafe approval because operators may not expect setup automation and external fetches from a skill described as simple OCR conversion.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
The install hook can fetch `pyproject.toml` and `uv.lock` from a remote GitHub repository at install time if local files are missing, which creates a supply-chain risk and makes the installed dependency set depend on mutable network content. Because these files control what packages and versions are installed, a repository compromise, branch change, or man-in-the-middle of the distribution path could cause execution of attacker-chosen code during or after dependency installation.

Context-Inappropriate Capability

Medium
Confidence
90% confidence
Finding
The client accepts arbitrary host and port values and then sends OCR prompts and optionally full base64-encoded document images to that endpoint. In a skill intended for local OCR, this creates a real data exfiltration risk because sensitive document contents can be transmitted to a remote server without enforcing localhost-only destinations or requiring explicit consent.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The README describes OCR processing through Ollama but does not warn users that document contents are sent to a local or remote Ollama service for inference. This can lead users to process sensitive PDFs without understanding the data-flow or trust boundary, especially because the host and port are configurable and may point to a non-local service.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The documentation does not prominently warn that input documents/images are sent to an Ollama service for processing. Even if the default target is localhost, OCR inputs may contain sensitive data, and users need an explicit privacy notice before routing documents to another service endpoint.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The code writes cropped page regions to a temporary file on disk using delete=False, which can leave sensitive document fragments recoverable from local storage if cleanup fails or the process crashes. In an OCR tool, these cropped images may contain confidential document content, so undisclosed persistence creates a real privacy and data-handling risk.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The script performs silent network access via `curl` and then modifies the local environment by creating a virtualenv and running `uv sync`, but it does not explicitly obtain user consent for these actions. This is dangerous because users may execute the installer expecting only local setup, while it can reach out to the internet and install arbitrary transitive code from dependency sources.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The code posts OCR payloads, including base64 image data, over plain HTTP to the configured Ollama server. If the service is not strictly local, document contents and prompts can be intercepted or modified in transit, which is especially risky for confidential PDFs or images.

Missing User Warnings

Medium
Confidence
74% confidence
Finding
The fallback logic writes temporary PNGs into the source PDF's directory using a predictable prefix (`temp_count_`) and then deletes every matching file via glob. In a shared or attacker-influenced directory, this can remove unrelated files matching that prefix, causing unintended data loss and making the cleanup behavior unsafe.

External Script Fetching

High
Category
Supply Chain
Content
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull glm-ocr:q8_0

# Install poppler-utils (for PDF to image conversion)
Confidence
98% confidence
Finding
curl -fsSL https://ollama.com/install.sh | sh

External Script Fetching

Low
Category
Supply Chain
Content
brew install poppler            # macOS

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### Install via ClawHub (Recommended)
Confidence
95% confidence
Finding
curl -LsSf https://astral.sh/uv/install.sh | sh

External Script Fetching

High
Category
Supply Chain
Content
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull glm-ocr:q8_0

# Install poppler-utils (for PDF to image conversion)
Confidence
97% confidence
Finding
curl -fsSL https://ollama.com/install.sh | sh

External Script Fetching

Low
Category
Supply Chain
Content
brew install poppler            # macOS

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### 2. Install with uv (Recommended)
Confidence
90% confidence
Finding
curl -LsSf https://astral.sh/uv/install.sh | sh

External Script Fetching

Low
Category
Supply Chain
Content
return 0
    else
        echo "⚠️  Local ${file} not found, trying GitHub..."
        if curl -sLf "${github_url}" -o "${SKILL_DIR}/.tmp_${file}"; then
            echo "✅ Downloaded ${file} from GitHub"
            return 0
        else
Confidence
90% confidence
Finding
curl -sLf "${github_url}" -o "${SKILL_DIR}/.tmp_${file}"; then echo "✅ Downloaded ${file} from GitHub" return 0 else echo "❌ Failed to get ${file} from both

Chaining Abuse

High
Category
Tool Misuse
Content
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull glm-ocr:q8_0

# Install poppler-utils (for PDF to image conversion)
Confidence
97% confidence
Finding
| sh

Chaining Abuse

High
Category
Tool Misuse
Content
brew install poppler            # macOS

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### Install via ClawHub (Recommended)
Confidence
95% confidence
Finding
| sh

Chaining Abuse

High
Category
Tool Misuse
Content
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull glm-ocr:q8_0

# Install poppler-utils (for PDF to image conversion)
Confidence
98% confidence
Finding
| sh

Chaining Abuse

High
Category
Tool Misuse
Content
brew install poppler            # macOS

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### 2. Install with uv (Recommended)
Confidence
93% confidence
Finding
| sh

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal