voice-output

Security checks across malware telemetry and agentic risk

Overview

This voice-output skill appears to perform its stated TTS function, but it sends response text to ByteDance/Doubao using a bundled access token and can trigger too broadly.

Review before installing. Use only if you are comfortable with spoken response text being sent to ByteDance/Doubao and played aloud locally. Replace the bundled token with your own secured credential, rotate the exposed token if it is real, and narrow the triggers to explicit voice-reply commands.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (8)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 95% confidence
Finding: The skill documentation directs execution of a local Python script that uses shell execution, network access to an external TTS service, and local file writing, yet no permissions are declared. This creates a transparency and policy-enforcement gap: a caller or platform may treat the skill as low-risk while it can exfiltrate response text to a third party and invoke local system capabilities.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The documented behavior is narrower than the actual capability: the skill does not just 'speak aloud' locally, it sends content to a remote Doubao/ByteDance service, uses hardcoded credentials, writes audio files locally, and relies on afplay rather than explicitly controlling a target device. This mismatch is dangerous because it hides external data transmission and secret handling from users and reviewers, increasing the chance that sensitive assistant responses are disclosed unexpectedly.

Context-Inappropriate Capability

Medium

Confidence: 99% confidence
Finding: The script embeds a live APPID and access token directly in source code. Hardcoded secrets are easily exposed through source control, logs, backups, or local file access, allowing unauthorized use of the TTS account and possible billing abuse or service impersonation.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The trigger condition includes broad language such as 'Or any similar request to hear the response,' which can cause unintended activation. In this skill's context, accidental invocation is more dangerous because activation sends generated response text to an external TTS provider and plays it aloud, potentially exposing private or sensitive content.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The single-word trigger '语音' is too vague and can easily appear in normal conversation, making accidental activation likely. Because this skill performs external transmission and audible playback, false activations can leak assistant responses to a third-party service and nearby listeners.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill description omits any warning that response text is transmitted to an external TTS provider. This is a real privacy vulnerability because users may request spoken output assuming it is purely local, while potentially sensitive content is sent off-device without informed consent.

Missing User Warnings

High

Confidence: 99% confidence
Finding: A hardcoded access token is actively used in outbound API requests, which compounds the exposure: anyone with code access can reuse it to call the external service. In a voice skill context this is more dangerous because the token enables unattended third-party processing and potential cost abuse.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: User-provided text is sent to an external TTS provider over the network without any explicit disclosure or consent mechanism. This can leak sensitive content spoken by the assistant, including private prompts, personal data, or confidential information, to a third-party service.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal