document-parser

Security checks across malware telemetry and agentic risk

Overview

This document parser only acts when invoked, but it uploads user documents to a configurable remote HTTP service without clearly warning users about the privacy and transport risk.

Install only if you understand that files you parse may be uploaded to the configured parsing service. Use a trusted HTTPS endpoint, avoid confidential or regulated documents unless you control or trust the service and its retention policy, and protect any API key configured through environment variables or config.json.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (13)

Tainted flow: 'config' from os.environ.get (line 21, credential/environment) → requests.get (network output)

Critical
Category
Data Flow
Content
return {"error": "未配置 API Key"}
    
    try:
        response = requests.get(
            f"{config['base_url']}/{task_id}",
            headers={"Authorization": f"Bearer {config['api_key']}"}
        )
Confidence
93% confidence
Finding
response = requests.get( f"{config['base_url']}/{task_id}", headers={"Authorization": f"Bearer {config['api_key']}"} )

Tainted flow: 'api_url' from os.environ.get (line 55, credential/environment) → requests.post (network output)

Critical
Category
Data Flow
Content
if page_range:
                data["page_range"] = page_range
            
            response = requests.post(
                api_url,
                headers=headers,
                files=files,
Confidence
97% confidence
Finding
response = requests.post( api_url, headers=headers, files=files, data=data, timeout=120 )

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The README describes configuring an API key and even a custom HTTP parsing endpoint, which strongly implies documents are sent to an external service for processing, but it never clearly warns users that sensitive document contents may leave the local machine. This omission can cause users to upload confidential PDFs, scans, or Word files without informed consent, creating privacy, compliance, and data-handling risk.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The README shows configuration of an external API endpoint and API key, but does not clearly warn that uploaded document contents may be transmitted to a remote service. For a document parsing skill, this omission is security-relevant because users may process sensitive files such as contracts, IDs, financial reports, or internal documents without understanding the privacy and data-handling implications.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The skill explicitly configures a remote HTTP endpoint for document parsing, which strongly implies that user-supplied document contents may be transmitted off-host to an external service. Because the documentation does not warn users that potentially sensitive PDFs, images, or Word files may leave their environment, users could unknowingly expose confidential data; the use of plain HTTP further increases interception risk in transit.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The manifest defines a default remote HTTP endpoint for document parsing but does not disclose in the user-facing metadata that uploaded documents may be transmitted off-host to a third-party service. Because the skill handles potentially sensitive PDFs, images, and Word files, this omission can mislead users into exposing confidential content without informed consent, and the use of plain HTTP further increases interception risk.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The code transmits full document contents to a remote service without explicit user warning, and the default endpoint uses insecure HTTP. In a document parsing skill, this context makes the issue more dangerous because users may provide highly sensitive PDFs, scans, IDs, contracts, or financial records expecting local processing.

Unpinned Dependencies

Low
Category
Supply Chain
Content
requests>=2.28.0
python-docx>=0.8.11
Pillow>=9.0.0
Confidence
96% confidence
Finding
requests>=2.28.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
requests>=2.28.0
python-docx>=0.8.11
Pillow>=9.0.0
Confidence
95% confidence
Finding
python-docx>=0.8.11

Unpinned Dependencies

Low
Category
Supply Chain
Content
requests>=2.28.0
python-docx>=0.8.11
Pillow>=9.0.0
Confidence
97% confidence
Finding
Pillow>=9.0.0

Known Vulnerable Dependency: requests — 10 advisory(ies): CVE-2014-1830 (Exposure of Sensitive Information to an Unauthorized Actor in Requests); CVE-2024-47081 (Requests vulnerable to .netrc credentials leak via malicious URLs); CVE-2024-35195 (Requests `Session` object does not verify requests after making first request wi) +7 more

High
Category
Supply Chain
Confidence
90% confidence
Finding
requests

Known Vulnerable Dependency: python-docx — 2 advisory(ies): CVE-2016-5851 (Improper Restriction of XML External Entity Reference in python-docx); CVE-2016-5851 (python-docx before 0.8.6 allows context-dependent attackers to conduct XML Exter)

High
Category
Supply Chain
Confidence
94% confidence
Finding
python-docx

Known Vulnerable Dependency: Pillow — 10 advisory(ies): CVE-2016-2533 (Pillow buffer overflow in ImagingPcdDecode); CVE-2023-50447 (Arbitrary Code Execution in Pillow); CVE-2021-27922 (Pillow Uncontrolled Resource Consumption) +7 more

Critical
Category
Supply Chain
Confidence
97% confidence
Finding
Pillow

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal