minimax-tts

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real MiniMax text-to-speech helper, but it also includes an under-disclosed image-generation command that expands what the skill can do.

Review before installing. Use this only if you are comfortable sending input text to MiniMax with your API key, and treat the package as having an extra image-generation capability until that code is removed or clearly documented. Do not set MINIMAX_BASE_URL to an endpoint you do not trust.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (10)

Tainted flow: 'url' from os.environ.get (line 61, credential/environment) → requests.post (network output)

Critical
Category
Data Flow
Content
'model': 'image-01',
        'prompt': prompt
    }
    resp = requests.post(url, headers=headers, json=payload, timeout=60)
    if resp.status_code != 200:
        return {"error": f"API error {resp.status_code}: {resp.text[:200]}"}
Confidence
93% confidence
Finding
resp = requests.post(url, headers=headers, json=payload, timeout=60)

Tainted flow: 'image_urls' from requests.post (line 37, network input) → requests.get (network output)

Medium
Category
Data Flow
Content
return {"error": "No image URL returned"}
    
    # 下載圖片
    img_resp = requests.get(image_urls[0], timeout=60)
    if img_resp.status_code != 200:
        return {"error": f"Failed to download image: {img_resp.status_code}"}
Confidence
90% confidence
Finding
img_resp = requests.get(image_urls[0], timeout=60)

Tainted flow: 'url' from os.environ.get (line 61, credential/environment) → requests.post (network output)

Critical
Category
Data Flow
Content
},
        'language_boost': 'auto'
    }
    resp = requests.post(url, headers=headers, json=payload, timeout=60)
    if resp.status_code != 200:
        return {"error": f"HTTP error {resp.status_code}: {resp.text[:200]}"}
Confidence
93% confidence
Finding
resp = requests.post(url, headers=headers, json=payload, timeout=60)

Lp3

Medium
Category
MCP Least Privilege
Confidence
96% confidence
Finding
The skill declares no permissions even though its documented operation clearly requires reading an environment variable and making outbound network requests to the MiniMax API. This weakens user awareness and consent around secret access and data egress, making it easier for a seemingly simple TTS skill to transmit user content externally without explicit permission signaling.

Tp4

High
Category
MCP Tool Poisoning
Confidence
90% confidence
Finding
A skill described as text-to-speech but whose underlying code also performs image generation, remote downloads, and local file writes has materially broader behavior than users would expect. This mismatch creates a trust and review gap: a user may authorize or invoke a TTS tool while unknowingly enabling unrelated content generation and file-writing actions that increase attack surface and potential misuse.

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The file header and usage text reveal image-generation capability even though the skill metadata describes only text-to-speech. This scope mismatch is security-relevant because it expands data handling and outbound network behavior beyond what an installer or reviewer would reasonably expect. Undisclosed capabilities are especially risky in agent skills because they can be invoked under broader trust assumptions.

Description-Behavior Mismatch

High
Confidence
98% confidence
Finding
This code implements full image generation and then downloads the returned image, which materially exceeds the stated TTS-only purpose. That means the skill can transmit arbitrary prompts and retrieve external binary content without that functionality being disclosed, broadening the attack surface and creating an unexpected SSRF/download path. In the context of an agent skill, hidden extra capabilities are more dangerous because users may grant trust based on the narrower description.

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The CLI exposes an image command that is not reflected in the skill description, confirming undeclared capability at the user interface level. This is not just documentation drift; it enables invocation of functionality outside the promised scope, undermining trust and potentially bypassing review expectations. In an agent ecosystem, such hidden commands are a meaningful security concern.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The README instructs users to send text to MiniMax's remote TTS API and to provide an API key, but it does not clearly disclose that all synthesized text will leave the local environment and be transmitted to a third-party service. This can cause accidental exposure of sensitive or regulated text if users assume the skill operates locally, especially in agent workflows where prompts may contain secrets, personal data, or proprietary content.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The documentation names an API key and external endpoint but does not clearly warn that user-provided text will be sent to a third-party service for processing. In a TTS context, users may paste sensitive or regulated text, so missing disclosure increases the risk of unintended data exposure to an external provider.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal