document-parser

Security checks across malware telemetry and agentic risk

Overview

This document parser only acts when invoked, but it uploads user documents to a configurable remote HTTP service without clearly warning users about the privacy and transport risk.

Install only if you understand that files you parse may be uploaded to the configured parsing service. Use a trusted HTTPS endpoint, avoid confidential or regulated documents unless you control or trust the service and its retention policy, and protect any API key configured through environment variables or config.json.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (13)

Tainted flow: 'config' from os.environ.get (line 21, credential/environment) → requests.get (network output)

Critical

Category: Data Flow
Content: return {"error": "未配置 API Key"} try: response = requests.get( f"{config['base_url']}/{task_id}", headers={"Authorization": f"Bearer {config['api_key']}"} )
Confidence: 93% confidence
Finding: response = requests.get( f"{config['base_url']}/{task_id}", headers={"Authorization": f"Bearer {config['api_key']}"} )

Tainted flow: 'api_url' from os.environ.get (line 55, credential/environment) → requests.post (network output)

Critical

Category: Data Flow
Content: if page_range: data["page_range"] = page_range response = requests.post( api_url, headers=headers, files=files,
Confidence: 97% confidence
Finding: response = requests.post( api_url, headers=headers, files=files, data=data, timeout=120 )

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README describes configuring an API key and even a custom HTTP parsing endpoint, which strongly implies documents are sent to an external service for processing, but it never clearly warns users that sensitive document contents may leave the local machine. This omission can cause users to upload confidential PDFs, scans, or Word files without informed consent, creating privacy, compliance, and data-handling risk.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README shows configuration of an external API endpoint and API key, but does not clearly warn that uploaded document contents may be transmitted to a remote service. For a document parsing skill, this omission is security-relevant because users may process sensitive files such as contracts, IDs, financial reports, or internal documents without understanding the privacy and data-handling implications.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill explicitly configures a remote HTTP endpoint for document parsing, which strongly implies that user-supplied document contents may be transmitted off-host to an external service. Because the documentation does not warn users that potentially sensitive PDFs, images, or Word files may leave their environment, users could unknowingly expose confidential data; the use of plain HTTP further increases interception risk in transit.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The manifest defines a default remote HTTP endpoint for document parsing but does not disclose in the user-facing metadata that uploaded documents may be transmitted off-host to a third-party service. Because the skill handles potentially sensitive PDFs, images, and Word files, this omission can mislead users into exposing confidential content without informed consent, and the use of plain HTTP further increases interception risk.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The code transmits full document contents to a remote service without explicit user warning, and the default endpoint uses insecure HTTP. In a document parsing skill, this context makes the issue more dangerous because users may provide highly sensitive PDFs, scans, IDs, contracts, or financial records expecting local processing.

Unpinned Dependencies

Low

Category: Supply Chain
Content: requests>=2.28.0 python-docx>=0.8.11 Pillow>=9.0.0
Confidence: 96% confidence
Finding: requests>=2.28.0

Unpinned Dependencies

Low

Category: Supply Chain
Content: requests>=2.28.0 python-docx>=0.8.11 Pillow>=9.0.0
Confidence: 95% confidence
Finding: python-docx>=0.8.11

Unpinned Dependencies

Low

Category: Supply Chain
Content: requests>=2.28.0 python-docx>=0.8.11 Pillow>=9.0.0
Confidence: 97% confidence
Finding: Pillow>=9.0.0

Known Vulnerable Dependency: requests — 10 advisory(ies): CVE-2014-1830 (Exposure of Sensitive Information to an Unauthorized Actor in Requests); CVE-2024-47081 (Requests vulnerable to .netrc credentials leak via malicious URLs); CVE-2024-35195 (Requests `Session` object does not verify requests after making first request wi) +7 more

High

Category: Supply Chain
Confidence: 90% confidence
Finding: requests

Known Vulnerable Dependency: python-docx — 2 advisory(ies): CVE-2016-5851 (Improper Restriction of XML External Entity Reference in python-docx); CVE-2016-5851 (python-docx before 0.8.6 allows context-dependent attackers to conduct XML Exter)

High

Category: Supply Chain
Confidence: 94% confidence
Finding: python-docx

Known Vulnerable Dependency: Pillow — 10 advisory(ies): CVE-2016-2533 (Pillow buffer overflow in ImagingPcdDecode); CVE-2023-50447 (Arbitrary Code Execution in Pillow); CVE-2021-27922 (Pillow Uncontrolled Resource Consumption) +7 more

Critical

Category: Supply Chain
Confidence: 97% confidence
Finding: Pillow

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal