Speech Synthesis

Security checks across malware telemetry and agentic risk

Overview

This speech synthesis skill is not clearly malicious, but it asks for an API key, stores it in a local plaintext .env file, and sends speech-generation data through a third-party service with weak disclosure.

Install only if you are comfortable giving this skill a XiaoBenYang API key and sending speech text/audio-generation requests to its remote service. Treat the key as a real secret: it may be stored in a local .env file in plaintext, so avoid shared or versioned workspaces unless you can control that file.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (9)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 87% confidence
Finding: The skill requires capabilities to read/write local configuration, access environment variables, and make network requests, but it does not declare these permissions to users. That undermines informed consent and makes sensitive actions like credential persistence and outbound transmission harder to audit.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The documented purpose is speech synthesis, but the skill behavior includes storing API keys locally, using an external xiaobenyang.com API instead of directly integrating Edge TTS, and exposing more generic remote-calling patterns than advertised. This mismatch prevents users from accurately assessing trust boundaries and increases the risk of secret exposure or unintended remote actions.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The workflow example tells the model to call an unrelated school-search function, indicating the document was likely copied from another skill without proper review. Such instruction drift can cause misrouting, accidental invocation of unintended tools, or unsafe handling of user input in the wrong execution path.

Intent-Code Divergence

Low

Confidence: 84% confidence
Finding: Requiring the model to directly display raw API response data can expose internal metadata, request details, URLs, identifiers, or echoed user content that should not be shown verbatim. For a media-generation skill, raw backend output is often not the safest or most appropriate user-facing format.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: This module persists an API key into a local .env file and mirrors it into the process environment, creating a durable local secret store inside the skill. For a speech-synthesis skill, storing credentials in project-local plaintext is not inherently required and increases the chance of accidental disclosure through source packaging, backups, logs, or misconfigured file permissions.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill instructs users to provide an API key and then save it locally, but does not disclose persistence details, storage location, retention, or security implications. Users may unknowingly place a secret into a shared filesystem, workspace, or versioned environment file.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The skill advertises cloud storage and remote synthesis but gives no warning that user text and generated audio may be transmitted to third-party services. This creates a privacy and compliance risk, especially if users submit sensitive content believing processing is local or limited to Edge TTS.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The function silently writes a supplied API key to .env without any user-facing disclosure that the secret will be stored on disk. This can lead users to provide credentials expecting transient use, while the skill creates a persistent plaintext copy that may later be exposed to other local users, tooling, or archived artifacts.

Ssd 3

Medium

Confidence: 94% confidence
Finding: Displaying raw API return data can leak sensitive values such as service-side identifiers, URLs, internal error details, echoed prompts, or other user-provided content. In a skill that handles credentials, text payloads, and remote services, this broad disclosure materially increases data exposure risk.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal