Byted Text To Speech

Security checks across malware telemetry and agentic risk

Overview

This text-to-speech skill mostly matches its stated purpose, but it can also use broader cloud credentials to list or create API keys and save them locally without clear user-facing disclosure.

Review before installing. Use this only if you are comfortable sending synthesis text to Volcengine/ByteDance. Prefer manually setting MODEL_SPEECH_API_KEY, do not expose ARK_SKILL_API_KEY or ARK_SKILL_API_BASE to this skill unless you intentionally want it to manage API keys, and check scripts/.env or rotate keys if one was created unexpectedly.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (8)

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
This TTS skill performs secret discovery, remote API-key listing, and API-key creation in addition to speech synthesis. Expanding a user-facing media skill into credential-management behavior increases the attack surface and violates least-privilege expectations, especially because it can silently provision reusable credentials.

Context-Inappropriate Capability

High
Confidence
97% confidence
Finding
The code can call a remote CreateAPIKey action and persist the returned key locally, which grants the skill a privileged capability unrelated to basic text-to-speech. If the upstream bearer token is exposed or misused, the skill can mint additional credentials that survive beyond the current session and broaden compromise.

Vague Triggers

Medium
Confidence
82% confidence
Finding
The activation guidance is broad enough to trigger on common requests such as '读给我听' or any task that '本质上需要将文本内容转为可播放的音频'. In an agent environment, this can cause over-invocation, sending user text to an external TTS service when a simple local readout or non-tool response would have sufficed, creating unnecessary data exposure and user-surprise risk.

Vague Triggers

Medium
Confidence
88% confidence
Finding
The usage section lists many positive triggers but provides no boundary conditions or negative examples, so the agent has little guidance on when not to activate the skill. This increases the chance of applying the tool to sensitive, copyrighted, or merely conversational content and transmitting that content to a third-party API without sufficient necessity.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The guide instructs users to place a live API key into a workspace-local `.env` file but does not warn that the credential is sensitive, should not be committed to source control, and should be access-restricted. In a shared workspace, repo, or agent environment, this increases the risk of accidental disclosure through commits, logs, backups, or other skills/tools reading local files.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The skill stores the recovered or newly created speech API key in a local .env file without any user disclosure or consent. Persisting secrets to disk creates a longer-lived target for theft through local file access, backups, logs, or later unintended reuse by other processes.

Missing User Warnings

Medium
Confidence
82% confidence
Finding
The code sends a bearer credential in outbound requests to a configurable base URL, and there is no validation that the endpoint is trusted or HTTPS-only. If ARK_SKILL_API_BASE is misconfigured or attacker-controlled, the authorization token could be disclosed to an unintended host.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The function sends arbitrary input text to a third-party TTS endpoint over the network, but the code does not provide any user-consent check, sensitivity filtering, or clear warning at the point where data leaves the local environment. In a skill context, users may provide confidential or personal text for conversion, so silent exfiltration to an external vendor is a real privacy and compliance risk even though the transmission uses HTTPS.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal