豆包语音合成 2.0

Security checks across malware telemetry and agentic risk

Overview

This is a coherent text-to-speech skill that uses a disclosed remote provider, but users should treat their token and synthesized text as sensitive.

Install only if you trust this skill and Volcengine/Bytedance with the text you synthesize and your TTS access token. Prefer environment variables or a secret manager over plaintext JSON config, avoid sharing debug output that includes VOLCANO_ACCESS_TOKEN, and be cautious with playback on Windows when filenames or speaker IDs come from untrusted input.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (5)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
elif sys.platform == 'linux':
                subprocess.run(['aplay', output], check=True)
            elif sys.platform == 'win32':
                subprocess.run(['start', output], shell=True, check=True)
            else:
                print(f"⚠️  未知平台,无法自动播放:{output}", file=sys.stderr)
                return False
Confidence
98% confidence
Finding
subprocess.run(['start', output], shell=True, check=True)

Lp3

Medium
Category
MCP Least Privilege
Confidence
83% confidence
Finding
The skill documentation indicates capabilities to read environment variables, access local files/configuration, invoke shell commands, and communicate over the network, yet no permissions are declared. This creates a transparency and consent gap: users may install a seemingly simple TTS skill without being explicitly told it will access credentials, local config, external services, and local playback tooling.

Tp4

High
Category
MCP Tool Poisoning
Confidence
88% confidence
Finding
The documented behavior does not cleanly match the stated purpose: it relies on an external remote TTS service, reads credentials, can play audio locally, and claims features such as emotion control, voice commands, and a full voice library that are not clearly substantiated. This mismatch is dangerous because users cannot make informed trust decisions when data transmission, local execution, and feature scope are understated or overstated.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The skill instructs users to provide credentials and send text to a remote TTS provider, but it does not present a clear privacy or data-transfer warning. Users may unknowingly transmit sensitive prompts or regulated content to a third-party service along with authentication material, increasing confidentiality and compliance risk.

Credential Access

High
Category
Privilege Escalation
Content
"seedtts2": {
        "env": {
          "VOLCANO_APP_ID": "你的 APP ID",
          "VOLCANO_ACCESS_TOKEN": "你的 Access Token"
        }
      }
    }
Confidence
80% confidence
Finding
Access Token

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal