doubao-tts

Security checks across malware telemetry and agentic risk

Overview

This is a cloud text-to-speech skill with disclosed credential and network use, but users should handle API keys, private text, and cloned voices carefully.

Install if you are comfortable sending the text you synthesize to Doubao/Volcengine and using a provider access token in this environment. Keep the access key out of logs, repositories, screenshots, and command history where possible, and use cloned voice IDs only for voices you are authorized to use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (4)

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README promotes personal voice cloning and use of `S_` voice IDs without any warning about consent, impersonation risk, or privacy obligations. In a TTS skill, that omission is security-relevant because users may infer that cloning and using another person's voice is routine or acceptable, increasing the chance of misuse, social engineering, or non-consensual biometric voice processing.

Missing User Warnings

Low

Confidence: 87% confidence
Finding: The README instructs users to export `DOUBAO_ACCESS_KEY` but does not state that this value is a sensitive secret that must not be committed, logged, or shared. While environment variables are a normal configuration mechanism, omitting basic secret-handling guidance can lead to accidental exposure through shell history, screenshots, CI logs, or checked-in `.env` files.

Vague Triggers

High

Confidence: 93% confidence
Finding: The description mandates using this skill for very broad phrases such as '帮我把这段话读出来' or '生成一段音频', which can match ordinary requests that do not clearly imply consent to send content to a third-party TTS provider. This can cause unintended invocation and data disclosure of arbitrary user-provided text, especially when the text may contain sensitive information.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The instructions explain how to supply app credentials and text for synthesis but do not include a clear user-facing warning that both the entered text and authentication material are used with a third-party cloud service. This creates a transparency and privacy risk because users may unknowingly send sensitive text off-platform for processing.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal