minimax-tts

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real MiniMax text-to-speech helper, but it also includes an under-disclosed image-generation command that expands what the skill can do.

Review before installing. Use this only if you are comfortable sending input text to MiniMax with your API key, and treat the package as having an extra image-generation capability until that code is removed or clearly documented. Do not set MINIMAX_BASE_URL to an endpoint you do not trust.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (10)

Tainted flow: 'url' from os.environ.get (line 61, credential/environment) → requests.post (network output)

Critical

Category: Data Flow
Content: 'model': 'image-01', 'prompt': prompt } resp = requests.post(url, headers=headers, json=payload, timeout=60) if resp.status_code != 200: return {"error": f"API error {resp.status_code}: {resp.text[:200]}"}
Confidence: 93% confidence
Finding: resp = requests.post(url, headers=headers, json=payload, timeout=60)

Tainted flow: 'image_urls' from requests.post (line 37, network input) → requests.get (network output)

Medium

Category: Data Flow
Content: return {"error": "No image URL returned"} # 下載圖片 img_resp = requests.get(image_urls[0], timeout=60) if img_resp.status_code != 200: return {"error": f"Failed to download image: {img_resp.status_code}"}
Confidence: 90% confidence
Finding: img_resp = requests.get(image_urls[0], timeout=60)

Tainted flow: 'url' from os.environ.get (line 61, credential/environment) → requests.post (network output)

Critical

Category: Data Flow
Content: }, 'language_boost': 'auto' } resp = requests.post(url, headers=headers, json=payload, timeout=60) if resp.status_code != 200: return {"error": f"HTTP error {resp.status_code}: {resp.text[:200]}"}
Confidence: 93% confidence
Finding: resp = requests.post(url, headers=headers, json=payload, timeout=60)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 96% confidence
Finding: The skill declares no permissions even though its documented operation clearly requires reading an environment variable and making outbound network requests to the MiniMax API. This weakens user awareness and consent around secret access and data egress, making it easier for a seemingly simple TTS skill to transmit user content externally without explicit permission signaling.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 90% confidence
Finding: A skill described as text-to-speech but whose underlying code also performs image generation, remote downloads, and local file writes has materially broader behavior than users would expect. This mismatch creates a trust and review gap: a user may authorize or invoke a TTS tool while unknowingly enabling unrelated content generation and file-writing actions that increase attack surface and potential misuse.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The file header and usage text reveal image-generation capability even though the skill metadata describes only text-to-speech. This scope mismatch is security-relevant because it expands data handling and outbound network behavior beyond what an installer or reviewer would reasonably expect. Undisclosed capabilities are especially risky in agent skills because they can be invoked under broader trust assumptions.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: This code implements full image generation and then downloads the returned image, which materially exceeds the stated TTS-only purpose. That means the skill can transmit arbitrary prompts and retrieve external binary content without that functionality being disclosed, broadening the attack surface and creating an unexpected SSRF/download path. In the context of an agent skill, hidden extra capabilities are more dangerous because users may grant trust based on the narrower description.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The CLI exposes an image command that is not reflected in the skill description, confirming undeclared capability at the user interface level. This is not just documentation drift; it enables invocation of functionality outside the promised scope, undermining trust and potentially bypassing review expectations. In an agent ecosystem, such hidden commands are a meaningful security concern.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README instructs users to send text to MiniMax's remote TTS API and to provide an API key, but it does not clearly disclose that all synthesized text will leave the local environment and be transmitted to a third-party service. This can cause accidental exposure of sensitive or regulated text if users assume the skill operates locally, especially in agent workflows where prompts may contain secrets, personal data, or proprietary content.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documentation names an API key and external endpoint but does not clearly warn that user-provided text will be sent to a third-party service for processing. In a TTS context, users may paste sensitive or regulated text, so missing disclosure increases the risk of unintended data exposure to an external provider.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal