MiniMax Multimodal Toolkit

Security checks across malware telemetry and agentic risk

Overview

This MiniMax media skill mostly matches its purpose, but it handles voice, face, prompts, files, and API credentials in ways that need careful review before installation.

Install only if you are comfortable sending prompts, lyrics, images, videos, and voice samples to MiniMax. Do not use it to clone voices or use face references without consent, avoid confidential content in prompts, store the API key in a safer secret store or temporary environment rather than a shell profile, keep MINIMAX_API_HOST set only to the documented MiniMax endpoint, and choose output paths carefully because existing files may be overwritten.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (9)

Vague Triggers

Medium
Confidence
89% confidence
Finding
The invocation description is broad enough to trigger on generic speech/music/video/image requests, not just clearly MiniMax-specific tasks. That can cause unintended tool routing, leading the agent to invoke external-generation workflows and request/configure API access when a narrower, non-MiniMax response would have been more appropriate.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The skill promotes voice cloning from user-provided audio without any warning about consent, impersonation, or handling of biometric voice data. In this context, that omission increases the risk of unauthorized cloning, deceptive content creation, and privacy misuse involving sensitive personal data.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The instructions tell users to place API keys in shell profiles for persistence without warning that this increases credential exposure through dotfiles, backups, shared accounts, and accidental disclosure. In a skill intended for routine use, normalizing persistent storage of secrets materially raises the chance of credential leakage.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The documentation explicitly encourages sending a front-facing portrait image to a third-party API but does not warn that this is sensitive biometric/identity-related data or explain retention, consent, and handling risks. In a skill context, users may upload another person's face or their own without understanding that the image leaves the local environment and may be processed or stored externally.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
This script invokes ffmpeg with the -y flag in multiple subcommands, which forces overwriting existing output files without prompting the user. In an agent or automation context, a caller can unintentionally destroy existing media or other files at attacker-chosen output paths, causing silent data loss and making mistakes harder to detect or recover from.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The script transmits user-supplied lyrics, prompts, and API credentials to a remote third-party service, but provides no explicit privacy or data-handling warning before doing so. In an agent-skill context, this matters because users may assume local processing and unknowingly send sensitive or copyrighted content off-host.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The clone workflow uploads a user-supplied audio sample to a remote MiniMax API, but the script provides no explicit warning, confirmation, or consent checkpoint before transmitting potentially sensitive biometric voice data off-host. In a voice-cloning context, this is privacy-relevant and can expose personal or regulated data if users assume processing is local.

Missing User Warnings

Medium
Confidence
82% confidence
Finding
When --generate-bgm is used, the script sends the user-provided music prompt to a third-party API without any explicit privacy notice or consent checkpoint. If users include sensitive project details, names, or confidential creative directions in prompts, that data is transmitted off-host and may be logged or retained by the provider.

External Transmission

Medium
Category
Data Exfiltration
Content
echo "Generating music with model: $model"
  echo "Output format: $output_format"

  # Send request via curl
  local raw_output http_code response
  raw_output="$(curl -s -w "\n%{http_code}" \
    -X POST "$api_url" \
Confidence
89% confidence
Finding
curl local raw_output http_code response raw_output="$(curl -s -w "\n%{http_code}" \ -X POST "$api_url" \ -H "Authorization: Bearer ${MINIMAX_API_KEY}" \ -H "Content-Type: application/

VirusTotal

57/57 vendors flagged this skill as clean.

View on VirusTotal