MiniMax Multimodal Toolkit

Security checks across malware telemetry and agentic risk

Overview

This MiniMax media skill mostly matches its purpose, but it handles voice, face, prompts, files, and API credentials in ways that need careful review before installation.

Install only if you are comfortable sending prompts, lyrics, images, videos, and voice samples to MiniMax. Do not use it to clone voices or use face references without consent, avoid confidential content in prompts, store the API key in a safer secret store or temporary environment rather than a shell profile, keep MINIMAX_API_HOST set only to the documented MiniMax endpoint, and choose output paths carefully because existing files may be overwritten.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (9)

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The invocation description is broad enough to trigger on generic speech/music/video/image requests, not just clearly MiniMax-specific tasks. That can cause unintended tool routing, leading the agent to invoke external-generation workflows and request/configure API access when a narrower, non-MiniMax response would have been more appropriate.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill promotes voice cloning from user-provided audio without any warning about consent, impersonation, or handling of biometric voice data. In this context, that omission increases the risk of unauthorized cloning, deceptive content creation, and privacy misuse involving sensitive personal data.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The instructions tell users to place API keys in shell profiles for persistence without warning that this increases credential exposure through dotfiles, backups, shared accounts, and accidental disclosure. In a skill intended for routine use, normalizing persistent storage of secrets materially raises the chance of credential leakage.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The documentation explicitly encourages sending a front-facing portrait image to a third-party API but does not warn that this is sensitive biometric/identity-related data or explain retention, consent, and handling risks. In a skill context, users may upload another person's face or their own without understanding that the image leaves the local environment and may be processed or stored externally.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: This script invokes ffmpeg with the -y flag in multiple subcommands, which forces overwriting existing output files without prompting the user. In an agent or automation context, a caller can unintentionally destroy existing media or other files at attacker-chosen output paths, causing silent data loss and making mistakes harder to detect or recover from.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The script transmits user-supplied lyrics, prompts, and API credentials to a remote third-party service, but provides no explicit privacy or data-handling warning before doing so. In an agent-skill context, this matters because users may assume local processing and unknowingly send sensitive or copyrighted content off-host.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The clone workflow uploads a user-supplied audio sample to a remote MiniMax API, but the script provides no explicit warning, confirmation, or consent checkpoint before transmitting potentially sensitive biometric voice data off-host. In a voice-cloning context, this is privacy-relevant and can expose personal or regulated data if users assume processing is local.

Missing User Warnings

Medium

Confidence: 82% confidence
Finding: When --generate-bgm is used, the script sends the user-provided music prompt to a third-party API without any explicit privacy notice or consent checkpoint. If users include sensitive project details, names, or confidential creative directions in prompts, that data is transmitted off-host and may be logged or retained by the provider.

External Transmission

Medium

Category: Data Exfiltration
Content: echo "Generating music with model: $model" echo "Output format: $output_format" # Send request via curl local raw_output http_code response raw_output="$(curl -s -w "\n%{http_code}" \ -X POST "$api_url" \
Confidence: 89% confidence
Finding: curl local raw_output http_code response raw_output="$(curl -s -w "\n%{http_code}" \ -X POST "$api_url" \ -H "Authorization: Bearer ${MINIMAX_API_KEY}" \ -H "Content-Type: application/

VirusTotal

57/57 vendors flagged this skill as clean.

View on VirusTotal