PaddleOCR

Security checks across malware telemetry and agentic risk

Overview

This skill appears purpose-built for OCR, but it sends sensitive documents to a configured OCR API and saves multiple local copies, with some network behavior that is not clearly controlled.

Review before installing or using on confidential material. Only configure an OCR endpoint you trust, prefer a local or controlled endpoint for privileged legal or medical records, use --pages to minimize uploads, use --no-archive when you do not need retention, and delete generated archives and image folders when finished.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (7)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill clearly describes capabilities to read local files, write Markdown/archive outputs, access environment-based credentials, and call a remote OCR API, yet it declares no permissions or equivalent user-facing capability disclosure. This is dangerous because users may invoke it without understanding that sensitive local documents and secrets will be accessed and that outputs will be persisted locally and transmitted off-system.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The script downloads image content from arbitrary HTTP(S) URLs returned in OCR output, even though the skill is presented as operating on local PDFs/images. This creates an unexpected network egress and SSRF-style primitive: a compromised or misbehaving upstream parser can cause requests to internal services or untrusted hosts, and the downloaded content is then written to disk without validation.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The skill accepts arbitrary remote URLs via file_url and forwards them to the OCR provider, despite the skill description emphasizing local PDF/image processing. This creates a data-flow and trust-boundary issue: users may unknowingly cause the provider to fetch third-party resources, including sensitive internal URLs or untrusted content, which can enable SSRF-like access through the provider or unintended disclosure of requested URLs/documents.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill is intended for legal records, medical records, evidence scans, and similar highly sensitive documents, and it instructs use of a remote PaddleOCR API without prominently warning that document contents leave the local system. In this context, off-system transmission materially increases confidentiality, regulatory, and privilege risks because users may upload protected or privileged records unintentionally.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill states that it will, by default, retain source files, Markdown outputs, JSON results, extracted images, and metadata in an archive directory, but it does not prominently warn users about persistent local storage of sensitive case, medical, or evidence materials. This increases exposure by leaving multiple recoverable copies on disk that may be indexed, synced, backed up, or accessed by other users or processes.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The code sends document data to an external OCR/parsing provider via parse_document, while the skill is specifically marketed for legal records, medical records, and evidence scans—categories that are often highly sensitive. There is no explicit consent flow, warning, redaction step, or local-only mode in this file, so users may unknowingly exfiltrate confidential data to a third party.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The code transmits full document content or a remote document reference to an external OCR API, but this file contains no disclosure, consent gate, or policy check around that transfer. In the context of legal PDFs, medical records, and evidence scans, this is more sensitive than usual because the data is likely confidential or regulated.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal