Back to skill

Security audit

百度文档解析vlm-parser

Security checks across malware telemetry and agentic risk

Overview

This is a coherent Baidu cloud document-parsing skill, but users should understand that selected documents, document URLs, API credentials, and returned result links are sensitive.

Install only if you are comfortable sending the documents or document URLs you choose to Baidu for cloud processing. Avoid confidential or regulated documents unless approved for your use case, treat returned markdown_url and parse_result_url values as sensitive for their 30-day lifetime, and store Baidu API keys in a secure place outside source control.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (6)

Tainted flow: 'parse_result_url' from requests.post (line 158, network input) → requests.get (network output)

Medium
Category
Data Flow
Content
if download_result:
                        parse_result_url = result.get('result', {}).get('parse_result_url')
                        if parse_result_url:
                            parse_response = requests.get(parse_result_url)
                            parse_response.encoding = 'utf-8'
                            result['parse_result'] = parse_response.json()
                    return result
Confidence
95% confidence
Finding
parse_response = requests.get(parse_result_url)

Lp3

Medium
Category
MCP Least Privilege
Confidence
88% confidence
Finding
The skill documentation clearly indicates use of environment variables for credentials and outbound network access to Baidu APIs, but no explicit permissions are declared. This creates a transparency and policy-enforcement gap: users or hosting platforms may not realize the skill can access secrets and transmit document data externally, increasing the risk of unintended data exposure.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The trigger list includes broad terms such as '文档解析', 'PaddleOCR', and '多模态文档', which could match many ordinary user requests and cause the skill to activate unexpectedly. Because this skill sends documents or URLs to an external OCR/VLM service, accidental invocation can lead to unintentional transmission of sensitive files or metadata.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The description explains features and API usage but does not prominently warn that document content, base64 file data, or public file URLs are transmitted to Baidu's external API. In a document-processing skill, this omission is significant because users may provide sensitive contracts, IDs, financial records, or internal PDFs without understanding that third-party transfer will occur.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The document instructs users to place long-lived API credentials in a persistent settings file under the home directory without any warning about file permissions, secret exposure, backup/sync leakage, or safer secret-storage alternatives. Even though this is common setup guidance, it increases the chance that sensitive credentials are stored insecurely and later exposed through local compromise, accidental commits, or support/log sharing.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The verification step directs users to transmit API credentials to a remote endpoint using curl but does not warn that secrets are being sent over the network, may appear in shell history, process listings, or copied terminal logs, and should only be sent to the official HTTPS endpoint. In a credential-setup guide, this is contextually expected, but the lack of handling guidance still creates avoidable exposure risk.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.exposed_secret_literal

File appears to expose a hardcoded API secret or token.

Critical
Code
suspicious.exposed_secret_literal
Location
scripts/baidu_doc_vlm_parser.py:34