MiniMax TTS

Security checks across malware telemetry and agentic risk

Overview

This MiniMax text-to-speech skill is coherent and disclosed, but users should be careful with API keys, sensitive text or voice data, and voice deletion commands.

Install only if you trust MiniMax with the text and voice-related data you submit. Keep MINIMAX_API_KEY private, avoid sending secrets or regulated content, clone voices only with the speaker's permission, and double-check any --delete voice_id because custom voice deletion may be irreversible.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (13)

Tainted flow: 'download_url' from requests.get (line 96, network input) → requests.get (network output)

Medium
Category
Data Flow
Content
download_url = result["file"]["download_url"]

    print(f"正在下载音频...")
    audio_resp = requests.get(download_url, timeout=60)
    with open(save_path, 'wb') as f:
        f.write(audio_resp.content)
Confidence
90% confidence
Finding
audio_resp = requests.get(download_url, timeout=60)

Lp3

Medium
Category
MCP Least Privilege
Confidence
94% confidence
Finding
The skill uses environment variables and makes outbound network requests to a third-party API, but no corresponding permissions are declared. This weakens platform trust boundaries because users and reviewers are not clearly informed that local secrets and external transmission are involved.

Tp4

High
Category
MCP Tool Poisoning
Confidence
91% confidence
Finding
The skill is presented primarily as a text-to-speech tool, but the documented behavior also includes voice-resource management such as listing and deleting voices. This mismatch can mislead users about the scope of actions the skill may take, increasing the risk of unexpected destructive operations on user assets.

Description-Behavior Mismatch

Medium
Confidence
94% confidence
Finding
The file documents an irreversible `del_voice` operation even though the skill is described as a TTS capability. Expanding the documented capability surface to include destructive voice-management actions increases the chance an agent or integrator invokes deletion unexpectedly, especially if the skill is auto-selected for voice-related tasks.

Description-Behavior Mismatch

Medium
Confidence
88% confidence
Finding
The file is presented as a voice-list/query utility, but it also contains a destructive delete capability that can remove user voice assets. Hidden or under-disclosed destructive behavior increases the chance of accidental or unauthorized data loss, especially when embedded in a broader TTS skill context where users would not expect deletion actions.

Context-Inappropriate Capability

Medium
Confidence
90% confidence
Finding
Deleting cloned/generated voices is a capability beyond the core stated purpose of text-to-speech synthesis and voice listing. In an agent skill context, extra destructive permissions are risky because they expand blast radius if the tool is mis-invoked, chained incorrectly, or exposed to untrusted prompts.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The skill documents voice cloning and local audio persistence without warning users about biometric privacy, consent, retention, or the sensitivity of cloned voice artifacts. Voice data is highly sensitive, and unclear handling can lead to unauthorized cloning, misuse, or inadvertent retention of personal audio on disk.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The reference and example code send user-provided text to a third-party TTS service using a bearer token, but do not include any warning about privacy, consent, data retention, or safe credential handling. In a voice-cloning/TTS skill, the transmitted text may contain sensitive or regulated content, so omitting disclosure and handling guidance creates a real privacy and operational security risk.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
This documentation describes a voice-cloning workflow that transmits user audio and creates reusable cloned voices, but it provides no warning about consent, biometric privacy, impersonation, or sensitive data handling. In the context of a voice-cloning skill, that omission is more dangerous than ordinary API docs because users may upload another person’s voice or their own biometric data without understanding legal, ethical, or security implications.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The example shows how to delete a voice with a hardcoded `voice_id` and immediately prints success, but it does not warn that the operation is irreversible or recommend confirmation before execution. In agent settings, examples often become implementation templates, so this omission can normalize unsafe destructive behavior.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The script sends arbitrary user-provided text to a third-party TTS provider without any explicit disclosure, confirmation, or privacy warning at the CLI boundary. This can cause sensitive prompts, personal data, or confidential content to be transmitted off-system unexpectedly, which is especially relevant for an agent skill that may process user data on the user's behalf.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The delete operation executes immediately when --delete is provided, with no warning, preview, or confirmation step. This makes accidental deletion easy and increases the risk of irreversible loss of user-created voice assets from operator error or unintended agent behavior.

External Transmission

Medium
Category
Data Exfiltration
Content
if status == "Success":
    file_id = status_resp.json()["file_id"]
    file_resp = requests.get(
        f"https://api.minimaxi.com/v1/files/retrieve?file_id={file_id}",
        headers={"Authorization": f"Bearer {API_KEY}"}
    )
    download_url = file_resp.json()["file"]["download_url"]
Confidence
84% confidence
Finding
https://api.minimaxi.com/

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal