Volcengine TTS to TOS Agent

Security checks across malware telemetry and agentic risk

Overview

This skill does what it advertises: it turns text into speech, uploads the audio to Volcengine storage, and returns a temporary shareable link.

Install only if you are comfortable sending the input text to Volcengine and storing the generated audio in TOS. Use least-privilege credentials limited to the intended bucket, keep URL expiry short, avoid secrets or regulated personal data, and use `--keep-local` only when you want a local audio copy.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (5)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill clearly describes capabilities that require environment variable access, local file handling, and outbound network access, but it does not declare permissions for those operations. This creates a transparency and governance gap: users or orchestration systems may approve or run the skill without understanding that it sends data to external services and writes temporary files locally.

Description-Behavior Mismatch

Low

Confidence: 84% confidence
Finding: The skill can optionally persist synthesized audio to local disk via --keep-local, which expands data retention beyond the stated shareable-link workflow. If the synthesized text contains sensitive content, this creates an unexpected local artifact that may remain accessible to other users or processes on the host.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The workflow sends user-provided text to an external TTS API, uploads the generated audio to remote object storage, and returns a presigned URL that can be shared during its validity window, but the skill does not prominently warn users about these data flows. This can lead to unintended disclosure of sensitive text or generated audio, especially if users assume processing is local or do not understand that the link grants temporary remote access.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: User-provided text is transmitted to an external TTS provider without any explicit notice that input leaves the local environment. In agent contexts, users may supply secrets or regulated data, so silent third-party transmission can cause privacy, compliance, or confidentiality issues.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The script uploads generated audio to remote object storage and returns a presigned URL, effectively making the content shareable for the duration of the link. Without an explicit warning, users may not realize their content is being stored remotely and exposed through bearer-style URL access.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal