MinerU PDF Parser

Security checks across malware telemetry and agentic risk

Overview

This appears to be a legitimate MinerU cloud parser, but it needs review because it can send document contents to cloud services and an export path may include local files referenced from the generated Markdown.

Install only if you are comfortable sending selected documents to MinerU cloud services and, when using --to or mineru_parse_to, to the named third-party destination. Avoid confidential or regulated files unless your organization approves those services, scope tokens narrowly, and be cautious with Markdown that contains local image paths before exporting to Linear or other sinks.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (16)

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: This sink is explicitly designed to transmit parsed document content to Feishu/Lark, which extends the skill from local parsing into third-party publication/storage. That creates a real data-exfiltration and scope-expansion risk, especially if users invoke the skill expecting only local conversion based on the manifest’s parsing-focused description.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The code adds a SaaS publishing/export pathway that is not necessary for document parsing itself and sends full Markdown content to an external service. In an agent setting, this broadens the skill’s authority and increases the chance that sensitive parsed material is uploaded outside the user’s expected trust boundary.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: This sink sends parsed document content, title, and potentially embedded images to Linear by creating a remote issue. That is a real data-exfiltration path and it exceeds the advertised document-parsing purpose in the provided skill metadata, making it security-relevant even if implemented as a legitimate integration feature.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The code reads local image files referenced in Markdown and converts them into base64 data URIs that are then included in the outbound Linear issue. This can unintentionally exfiltrate local file content, especially if the Markdown references sensitive local images or if document content is attacker-influenced.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: This sink adds outbound publication behavior to Microsoft OneNote, which goes beyond local document parsing and converts parsed document contents into data sent to a third-party cloud service. In a skill described primarily as parsing and conversion, this materially changes the data-flow and can cause unintended exfiltration of sensitive document contents if invoked without clear user understanding.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The code retrieves an OAuth bearer token from the environment and sends the full rendered document to the Microsoft Graph API. Even if intended as an integration feature, using external network services and cloud credentials for a parser skill increases the attack surface and creates a real confidentiality risk if the capability is unexpected, insufficiently disclosed, or callable in broader agent workflows.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The quick-start prominently encourages users to submit documents to a cloud parsing API before presenting the privacy/confidentiality caveat later in the README. Because this skill handles potentially sensitive PDFs, Office files, and images, the missing upfront warning can lead users to unknowingly transmit confidential data to a third party.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README strongly promotes uploading PDFs, Office files, images, and direct delivery to third-party tools before presenting a clear, prominent privacy warning. Because this skill is explicitly cloud-backed and intended for agent automation, users may expose sensitive document contents to remote services without realizing that every file is transmitted off-host.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill explicitly instructs users to parse local files or URLs via remote Agent/Standard APIs, but it does not provide a clear privacy or data-transmission warning. In this context, users may upload sensitive PDFs, Office files, or images without understanding that document contents and metadata leave the local environment and may be sent to third-party services.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The document explicitly states that the skill is cloud-backed and later concedes that every file is uploaded to MinerU's cloud, but this privacy and data-handling warning is not surfaced prominently at the start where users decide whether to use the skill. In a document-parsing skill, that omission can cause users to submit confidential or regulated files under a mistaken assumption of local processing, creating privacy, compliance, and data-exposure risk.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The documentation explicitly encourages non-interactive fan-out of parsed document contents to multiple external services and local filesystem targets, but it omits a clear warning that doing so may transmit sensitive document data to third-party platforms. In an agent context, this increases the chance of accidental data exfiltration because automated workflows may deliver content without an informed human review step.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The integration matrix lists numerous API keys, tokens, webhook URLs, and filesystem destinations required through environment variables, but it does not advise users on secure secret handling. In practice, this can lead to credentials being exposed via shell history, logs, process environments, CI output, or improperly scoped agent runtime environments.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The tools accept local files or URLs and can parse via cloud/standard APIs and deliver output to third-party sinks, but the interface provides no explicit warning or consent boundary about data leaving the local environment. In an agent context, this can cause users to unintentionally transmit sensitive document contents to remote services such as MinerU backends or configured sink platforms.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: The code retrieves Feishu credentials and uses them to authenticate network requests, but there is no in-code mechanism ensuring users are warned that both credentials and document content will be sent to Feishu. In agent workflows, lack of clear disclosure can cause unintentional transfer of sensitive data to a third-party service.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: This code accesses an OAuth access token and uploads parsed document content without any warning, confirmation, or sensitivity checks in the file itself. In agent environments, that can lead to silent disclosure of private documents to a connected SaaS account, especially when users expect only local parsing or markdown conversion.

Missing User Warnings

Medium

Confidence: 77% confidence
Finding: The sink sends parsed document content and uses WeCom credentials to call external APIs, but this file provides no user-visible consent, classification check, or warning before transmitting potentially sensitive material. In an agent context, users may not realize that local document contents are being forwarded to a third-party service, increasing the risk of unintended data disclosure.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal