MinerU PDF Extractor

Security checks across malware telemetry and agentic risk

Overview

This skill does what it claims: it sends chosen PDFs or PDF URLs to MinerU for conversion and downloads the extracted results back locally.

Install only if you are comfortable sending selected PDFs, PDF URLs, and a MinerU API token to MinerU-controlled services. Use it from a dedicated working folder, avoid confidential or regulated documents unless approved, keep the default MinerU endpoint unless you trust an alternative, and inspect extracted files before using them.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (16)

Intent-Code Divergence

Medium
Confidence
83% confidence
Finding
The documentation claims input validation, sanitization, and directory traversal protection without showing any implementation or verifiable evidence in this file. Security assurances that cannot be substantiated are dangerous because operators may trust the skill with untrusted URLs, filenames, ZIP archives, or extraction paths under a false sense of safety.

Intent-Code Divergence

Medium
Confidence
97% confidence
Finding
The document claims 'path confinement' and presents the script as secure, but the extraction step uses `unzip -q` without inspecting archive entry names for `..`, absolute paths, or symlinks. A malicious or compromised ZIP from the remote service/CDN could perform Zip Slip-style writes outside the intended directory and overwrite user files despite the documentation's assurances.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The skill instructs users to upload local PDFs and submit remote PDF URLs to a third-party API, but it does not prominently warn that document contents and URLs will be shared externally. This is a real privacy and data-handling risk, especially if users process confidential files and assume the skill operates locally.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill explicitly supports uploading local PDF files to a third-party MinerU service, but the documentation does not clearly warn users that document contents may leave the local environment and be processed or stored externally. This creates a real confidentiality and compliance risk, especially if users submit sensitive internal, regulated, or proprietary PDFs under the assumption that extraction is local-only.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The online URL parsing flow instructs users to submit arbitrary PDF URLs to MinerU, but it lacks a clear warning that the third-party service will fetch the supplied link and that resulting content is then downloaded back into the user environment. This can expose sensitive URLs, private-access resources mistakenly believed to be safe, or create unexpected data-sharing and trust-boundary issues between the user, source host, and MinerU.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The guide instructs users to upload local PDFs and send a bearer token to a third-party remote API, but it does not clearly warn that document contents and metadata leave the local environment. This can cause unintentional disclosure of sensitive documents or credentials in environments where users assume the operation is local because the guide emphasizes 'local file parsing.'

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The documentation instructs users to submit a remote PDF URL to MinerU's external API but does not clearly disclose that the third-party service will fetch and process the document contents. This can cause unintentional data disclosure, especially if users supply sensitive or private document URLs under the assumption processing is local.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The document instructs users to upload local PDF files to an external MinerU service but does not clearly warn that document contents will leave the local environment and be processed by a third party. This can lead to inadvertent disclosure of sensitive or regulated data because users may treat the workflow as a normal local parsing operation.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The guide asks users to submit a document URL to a third-party API but does not clearly warn that MinerU servers will fetch that URL themselves. This can expose private, pre-signed, internal, or otherwise sensitive document locations to an external service and may surprise users who assume processing is local.

Natural-Language Policy Violations

Low
Confidence
79% confidence
Finding
The example request hard-codes the parsing language to Chinese without user choice. While not a direct security flaw, it can cause unintended processing behavior, data quality issues, and silent misclassification for non-Chinese documents.

Natural-Language Policy Violations

Low
Confidence
84% confidence
Finding
The embedded script builds requests with the language fixed to Chinese, which removes user control and may produce incorrect OCR or extraction results on other documents. This is primarily a quality and consent issue rather than a strong security bug, but it still represents unsafe default behavior in a parsing tool.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The script downloads a ZIP from a remote service and extracts it automatically into a local directory without requiring explicit user confirmation or performing safe extraction checks on archive entry paths. Even though the URL is restricted to a specific host, the archive contents are still untrusted data from an external source, so a malicious or compromised upstream could overwrite files via crafted paths or place unexpected content on disk.

External Transmission

Medium
Category
Data Exfiltration
Content
**Command:**
```bash
curl -X POST "${MINERU_BASE_URL}/extract/task" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ${MINERU_TOKEN}" \
    -d '{
Confidence
97% confidence
Finding
curl -X POST "${MINERU_BASE_URL}/extract/task" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${MINERU_TOKEN}" \ -d

External Transmission

Medium
Category
Data Exfiltration
Content
**Command:**
```bash
# Download ZIP package
curl -L -o "result.zip" \
  "YOUR_FULL_ZIP_URL_FROM_STEP2"

# Extract to folder
Confidence
88% confidence
Finding
curl -L -o "result.zip" \ "YOUR_FULL_ZIP_URL_FROM_STEP2" # Extract to folder unzip -q "result.zip" -d

External Transmission

Medium
Category
Data Exfiltration
Content
fi

# Send request to MinerU API
STEP1_RESPONSE=$(curl -s -X POST "${MINERU_BASE_URL}/extract/task" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ${MINERU_TOKEN}" \
    -d "$JSON_PAYLOAD")
Confidence
97% confidence
Finding
curl -s -X POST "${MINERU_BASE_URL}/extract/task" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${MINERU_TOKEN}" \ -d

External Transmission

Medium
Category
Data Exfiltration
Content
mkdir -p "$OUTPUT_DIR"

# Download result ZIP
curl -L -o "${OUTPUT_DIR}/result.zip" "$ZIP_URL"

# SECURITY: Validate ZIP file before extraction
if ! unzip -t "${OUTPUT_DIR}/result.zip" &>/dev/null; then
Confidence
91% confidence
Finding
curl -L -o "${OUTPUT_DIR}/result.zip" "$ZIP_URL" # SECURITY: Validate ZIP file before extraction if ! unzip -t "${OUTPUT_DIR}/result.zip" &>/dev/null; then echo "❌ Error: Invalid ZIP file" rm

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal