PaddleOCR Document Parsing

Security checks across malware telemetry and agentic risk

Overview

This is a coherent document-parsing skill that sends chosen documents to a configured PaddleOCR API and saves results locally, with the main risk being ordinary privacy exposure from remote OCR processing.

Install only if you are comfortable sending selected documents or document URLs to the PaddleOCR-compatible API endpoint you configure. Avoid confidential or regulated documents unless that provider is approved for your data, and use --stdout or clean up the temp result files when extracted content should not remain on disk.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep

Findings (4)

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The usage instructions tell the agent to send user-provided documents or URLs to an external PaddleOCR API, but there is no prominent upfront warning that document contents leave the local environment. Users may provide confidential invoices, financial reports, or scans without realizing they are being transmitted to a third-party service, creating privacy, compliance, and data-handling risks. The document-parsing context makes this more dangerous because the skill is explicitly aimed at sensitive business and personal documents.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The documentation states that parsed output is automatically written to a temp file by default, but it does not clearly warn that extracted document contents may persist on disk. For a document-parsing skill that may process invoices, financial reports, or scanned sensitive records, this can lead to unintended local data exposure through temp-file retention, backup, or access by other local users/processes.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The examples encourage URL-based document parsing but do not warn that document content will be fetched from the supplied URL and sent to an external OCR/layout provider. In this skill context, inputs are often sensitive business or personal documents, so omission of a transmission/privacy warning increases the risk of users unintentionally exposing confidential data to third-party services.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The library sends either a local document's full base64 contents or a user-supplied document URL to an external PaddleOCR endpoint, but the callable interface does not clearly surface that this is a network transfer of potentially sensitive document data. In a document-parsing skill, users may provide invoices, financial reports, scans, or PDFs containing PII, secrets, or regulated data, so the lack of explicit disclosure and consent materially increases privacy and compliance risk.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal