Byted Text To Speech

Security checks across malware telemetry and agentic risk

Overview

This text-to-speech skill mostly matches its stated purpose, but it can also use broader cloud credentials to list or create API keys and save them locally without clear user-facing disclosure.

Review before installing. Use this only if you are comfortable sending synthesis text to Volcengine/ByteDance. Prefer manually setting MODEL_SPEECH_API_KEY, do not expose ARK_SKILL_API_KEY or ARK_SKILL_API_BASE to this skill unless you intentionally want it to manage API keys, and check scripts/.env or rotate keys if one was created unexpectedly.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (8)

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: This TTS skill performs secret discovery, remote API-key listing, and API-key creation in addition to speech synthesis. Expanding a user-facing media skill into credential-management behavior increases the attack surface and violates least-privilege expectations, especially because it can silently provision reusable credentials.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The code can call a remote CreateAPIKey action and persist the returned key locally, which grants the skill a privileged capability unrelated to basic text-to-speech. If the upstream bearer token is exposed or misused, the skill can mint additional credentials that survive beyond the current session and broaden compromise.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The activation guidance is broad enough to trigger on common requests such as '读给我听' or any task that '本质上需要将文本内容转为可播放的音频'. In an agent environment, this can cause over-invocation, sending user text to an external TTS service when a simple local readout or non-tool response would have sufficed, creating unnecessary data exposure and user-surprise risk.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The usage section lists many positive triggers but provides no boundary conditions or negative examples, so the agent has little guidance on when not to activate the skill. This increases the chance of applying the tool to sensitive, copyrighted, or merely conversational content and transmitting that content to a third-party API without sufficient necessity.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The guide instructs users to place a live API key into a workspace-local `.env` file but does not warn that the credential is sensitive, should not be committed to source control, and should be access-restricted. In a shared workspace, repo, or agent environment, this increases the risk of accidental disclosure through commits, logs, backups, or other skills/tools reading local files.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill stores the recovered or newly created speech API key in a local .env file without any user disclosure or consent. Persisting secrets to disk creates a longer-lived target for theft through local file access, backups, logs, or later unintended reuse by other processes.

Missing User Warnings

Medium

Confidence: 82% confidence
Finding: The code sends a bearer credential in outbound requests to a configurable base URL, and there is no validation that the endpoint is trusted or HTTPS-only. If ARK_SKILL_API_BASE is misconfigured or attacker-controlled, the authorization token could be disclosed to an unintended host.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The function sends arbitrary input text to a third-party TTS endpoint over the network, but the code does not provide any user-consent check, sensitivity filtering, or clear warning at the point where data leaves the local environment. In a skill context, users may provide confidential or personal text for conversion, so silent exfiltration to an external vendor is a real privacy and compliance risk even though the transmission uses HTTPS.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal