MiniMax TTS

Security checks across malware telemetry and agentic risk

Overview

This MiniMax text-to-speech skill is coherent and disclosed, but users should be careful with API keys, sensitive text or voice data, and voice deletion commands.

Install only if you trust MiniMax with the text and voice-related data you submit. Keep MINIMAX_API_KEY private, avoid sending secrets or regulated content, clone voices only with the speaker's permission, and double-check any --delete voice_id because custom voice deletion may be irreversible.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (13)

Tainted flow: 'download_url' from requests.get (line 96, network input) → requests.get (network output)

Medium

Category: Data Flow
Content: download_url = result["file"]["download_url"] print(f"正在下载音频...") audio_resp = requests.get(download_url, timeout=60) with open(save_path, 'wb') as f: f.write(audio_resp.content)
Confidence: 90% confidence
Finding: audio_resp = requests.get(download_url, timeout=60)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 94% confidence
Finding: The skill uses environment variables and makes outbound network requests to a third-party API, but no corresponding permissions are declared. This weakens platform trust boundaries because users and reviewers are not clearly informed that local secrets and external transmission are involved.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 91% confidence
Finding: The skill is presented primarily as a text-to-speech tool, but the documented behavior also includes voice-resource management such as listing and deleting voices. This mismatch can mislead users about the scope of actions the skill may take, increasing the risk of unexpected destructive operations on user assets.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The file documents an irreversible `del_voice` operation even though the skill is described as a TTS capability. Expanding the documented capability surface to include destructive voice-management actions increases the chance an agent or integrator invokes deletion unexpectedly, especially if the skill is auto-selected for voice-related tasks.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The file is presented as a voice-list/query utility, but it also contains a destructive delete capability that can remove user voice assets. Hidden or under-disclosed destructive behavior increases the chance of accidental or unauthorized data loss, especially when embedded in a broader TTS skill context where users would not expect deletion actions.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: Deleting cloned/generated voices is a capability beyond the core stated purpose of text-to-speech synthesis and voice listing. In an agent skill context, extra destructive permissions are risky because they expand blast radius if the tool is mis-invoked, chained incorrectly, or exposed to untrusted prompts.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill documents voice cloning and local audio persistence without warning users about biometric privacy, consent, retention, or the sensitivity of cloned voice artifacts. Voice data is highly sensitive, and unclear handling can lead to unauthorized cloning, misuse, or inadvertent retention of personal audio on disk.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The reference and example code send user-provided text to a third-party TTS service using a bearer token, but do not include any warning about privacy, consent, data retention, or safe credential handling. In a voice-cloning/TTS skill, the transmitted text may contain sensitive or regulated content, so omitting disclosure and handling guidance creates a real privacy and operational security risk.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: This documentation describes a voice-cloning workflow that transmits user audio and creates reusable cloned voices, but it provides no warning about consent, biometric privacy, impersonation, or sensitive data handling. In the context of a voice-cloning skill, that omission is more dangerous than ordinary API docs because users may upload another person’s voice or their own biometric data without understanding legal, ethical, or security implications.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The example shows how to delete a voice with a hardcoded `voice_id` and immediately prints success, but it does not warn that the operation is irreversible or recommend confirmation before execution. In agent settings, examples often become implementation templates, so this omission can normalize unsafe destructive behavior.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The script sends arbitrary user-provided text to a third-party TTS provider without any explicit disclosure, confirmation, or privacy warning at the CLI boundary. This can cause sensitive prompts, personal data, or confidential content to be transmitted off-system unexpectedly, which is especially relevant for an agent skill that may process user data on the user's behalf.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The delete operation executes immediately when --delete is provided, with no warning, preview, or confirmation step. This makes accidental deletion easy and increases the risk of irreversible loss of user-created voice assets from operator error or unintended agent behavior.

External Transmission

Medium

Category: Data Exfiltration
Content: if status == "Success": file_id = status_resp.json()["file_id"] file_resp = requests.get( f"https://api.minimaxi.com/v1/files/retrieve?file_id={file_id}", headers={"Authorization": f"Bearer {API_KEY}"} ) download_url = file_resp.json()["file"]["download_url"]
Confidence: 84% confidence
Finding: https://api.minimaxi.com/

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal