豆包语音合成 2.0

Security checks across malware telemetry and agentic risk

Overview

This is a coherent text-to-speech skill that uses a disclosed remote provider, but users should treat their token and synthesized text as sensitive.

Install only if you trust this skill and Volcengine/Bytedance with the text you synthesize and your TTS access token. Prefer environment variables or a secret manager over plaintext JSON config, avoid sharing debug output that includes VOLCANO_ACCESS_TOKEN, and be cautious with playback on Windows when filenames or speaker IDs come from untrusted input.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (5)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: elif sys.platform == 'linux': subprocess.run(['aplay', output], check=True) elif sys.platform == 'win32': subprocess.run(['start', output], shell=True, check=True) else: print(f"⚠️ 未知平台，无法自动播放：{output}", file=sys.stderr) return False
Confidence: 98% confidence
Finding: subprocess.run(['start', output], shell=True, check=True)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 83% confidence
Finding: The skill documentation indicates capabilities to read environment variables, access local files/configuration, invoke shell commands, and communicate over the network, yet no permissions are declared. This creates a transparency and consent gap: users may install a seemingly simple TTS skill without being explicitly told it will access credentials, local config, external services, and local playback tooling.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 88% confidence
Finding: The documented behavior does not cleanly match the stated purpose: it relies on an external remote TTS service, reads credentials, can play audio locally, and claims features such as emotion control, voice commands, and a full voice library that are not clearly substantiated. This mismatch is dangerous because users cannot make informed trust decisions when data transmission, local execution, and feature scope are understated or overstated.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill instructs users to provide credentials and send text to a remote TTS provider, but it does not present a clear privacy or data-transfer warning. Users may unknowingly transmit sensitive prompts or regulated content to a third-party service along with authentication material, increasing confidentiality and compliance risk.

Credential Access

High

Category: Privilege Escalation
Content: "seedtts2": { "env": { "VOLCANO_APP_ID": "你的 APP ID", "VOLCANO_ACCESS_TOKEN": "你的 Access Token" } } }
Confidence: 80% confidence
Finding: Access Token

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal