mmVoiceMaker

Security checks across malware telemetry and agentic risk

Overview

This voice tool appears purpose-built for MiniMax text-to-speech work, but it needs Review because it can upload sensitive voice recordings and delete cloud voice assets without strong user-facing safeguards.

Install only if you are comfortable sending text, prompts, and voice recordings to MiniMax under your own API key. Use voice cloning only with recordings you own or have explicit permission to process, avoid putting secrets in prompt or preview text, do not echo or screenshot API keys, and do not run cleanup_unused_voices(dry_run=False) unless you intend to delete all custom cloned/designed voices in the account.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
Findings (29)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
def run_ffmpeg_command(args: List[str], timeout: int = 300) -> bool:
    """Execute ffmpeg command with given arguments"""
    try:
        result = subprocess.run(
            ["ffmpeg"] + args,
            capture_output=True,
            text=True,
Confidence
82% confidence
Finding
result = subprocess.run( ["ffmpeg"] + args, capture_output=True, text=True, timeout=timeout )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
try:
        # Use ffprobe to get format info
        result = subprocess.run(
            [
                "ffprobe", "-v", "quiet",
                "-print_format", "json",
Confidence
80% confidence
Finding
result = subprocess.run( [ "ffprobe", "-v", "quiet", "-print_format", "json", "-show_format", "-show_streams",

Tainted flow: 'url' from os.getenv (line 161, credential/environment) → requests.get (network output)

Critical
Category
Data Flow
Content
Returns:
        Saved file path
    """
    response = requests.get(url, timeout=timeout)
    response.raise_for_status()
    
    with open(output_path, "wb") as f:
Confidence
96% confidence
Finding
response = requests.get(url, timeout=timeout)

Intent-Code Divergence

Medium
Confidence
93% confidence
Finding
The function documentation frames this as TTS segment generation, but the implementation executes an arbitrary caller-supplied function from `segment["tts_function"]`. In a skill/plugin context, that turns structured data into executable behavior and can allow untrusted code paths or unexpected side effects if segment data is influenced by users or other agents.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The module exposes a bulk-deletion helper that iterates over all cloned and designed voices and deletes them with a single call when dry_run=False. Even if intended as maintenance functionality, this is a destructive capability that exceeds the narrowly described synthesis/post-processing purpose and creates unnecessary risk of accidental or unauthorized mass deletion if invoked by an agent or user without strong safeguards.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
cleanup_unused_voices is effectively a mass-deletion convenience function, but it does not actually determine whether voices are unused; it deletes every custom voice returned by the API. In an agent context, convenience wrappers for destructive operations are dangerous because they lower the barrier to irreversible data loss from prompt mistakes, misuse, or overly broad automation.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The voice cloning workflow accepts a user-supplied audio file and sends it to external processing via `quick_clone_voice`, but the CLI provides no explicit notice that potentially sensitive biometric voice data will be uploaded to a third-party API. This can lead to uninformed disclosure of personal or regulated data, especially because voiceprints are uniquely identifying and harder to revoke than passwords.

Missing User Warnings

Low
Confidence
82% confidence
Finding
The voice design command sends arbitrary user prompt text to an external service through `design_voice` without clearly disclosing that the description leaves the local environment. While prompts are less sensitive than raw biometric audio, they may still contain confidential character, branding, or personal information that users do not expect to be transmitted externally.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The documentation exposes voice cloning and source-audio upload workflows without any notice about consent, privacy, retention, or the fact that biometric voice data is sent to a third-party API. In a voice-maker skill, this omission is security-relevant because users may upload sensitive recordings or clone voices without understanding legal and privacy risks, increasing the chance of unauthorized biometric data processing.

Missing User Warnings

Low
Confidence
82% confidence
Finding
The deletion and cleanup commands describe removing custom voices but do not clearly warn that these actions may be irreversible. In a tool that manages user-created voice assets, unclear destructive-operation guidance can lead to accidental permanent loss of cloned or designed voices.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The guide documents voice cloning workflows but provides no warning that users may be uploading biometric voice data or that they must have the speaker’s consent. In an agent-facing skill, this omission can normalize cloning third-party voices without authorization, creating privacy, impersonation, and compliance risk even if the underlying feature is legitimate.

Missing User Warnings

Low
Confidence
80% confidence
Finding
The environment check mentions API connectivity testing and use of MINIMAX_VOICE_API_KEY without clearly warning that running the command may contact an external service using configured credentials. This is a transparency and operational-safety issue: users or agents may trigger outbound requests unexpectedly in restricted or privacy-sensitive environments.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The guide explicitly instructs users to echo the MiniMax API key to the terminal, which encourages disclosure of a sensitive secret. Terminal output may be visible to others, captured in screen recordings, shell logs, or CI logs, making credential exposure more likely in a skill centered on authenticated API usage.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The voice cloning examples demonstrate uploading and cloning a speaker's voice without any surrounding guidance on consent, authorization, biometric sensitivity, or retention of uploaded samples. In a skill specifically designed for voice cloning, omission of these safeguards can normalize misuse and lead users to clone voices they do not own or have permission to use, creating privacy, impersonation, and fraud risks.

Missing User Warnings

Medium
Confidence
85% confidence
Finding
The examples show direct deletion of cloned and designed voices, including a batch cleanup that deletes all custom voices, without any warning that the action may be irreversible. This can cause accidental destructive operations, especially because the file presents copy-paste runnable examples and the cleanup example encourages broad deletion with minimal friction.

Natural-Language Policy Violations

Medium
Confidence
93% confidence
Finding
The guidance explicitly instructs voice matching based on gender as a default character-trait factor, without requiring user preference, necessity, or sensitivity checks. In a voice-cloning and synthesis skill, this can lead the agent to infer or impose gendered attributes on speakers and generate potentially inappropriate, biased, or misrepresentative outputs, especially for ambiguous or non-binary roles.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The troubleshooting example prints the first part of the API key to stdout for debugging. Even partial credential disclosure can leak into terminal history, logs, CI output, screen recordings, or support screenshots, increasing the chance of secret exposure.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
This guide provides step-by-step voice cloning workflows without any warning about consent, authorization, or privacy obligations for uploaded recordings. In a voice-cloning skill, that omission materially increases the risk of impersonation, non-consensual biometric voice use, and misuse of third-party recordings, especially because the examples make cloning look routine and immediately usable.

Missing User Warnings

Medium
Confidence
82% confidence
Finding
The deletion and batch cleanup examples invoke destructive operations, including removing all custom voices, without clearly warning that deletion may be irreversible or difficult to recover from. This can lead to accidental loss of user-created assets and operational disruption, particularly because the examples normalize running the actual delete call immediately after the dry run.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The code logs raw segment text to stdout during processing, which can expose sensitive or private input content in terminal history, CI/CD logs, agent transcripts, or centralized log collection systems. In a voice synthesis workflow, segment text may contain unpublished scripts, personal data, secrets, or regulated content, so unconditional logging creates a real confidentiality risk.

Missing User Warnings

Medium
Confidence
84% confidence
Finding
This helper silently fetches remote data and persists it to disk without any built-in warning, confirmation, provenance checks, or validation that the payload is actually safe audio. While not inherently malicious, this behavior increases the chance that untrusted remote content is stored and later consumed by other tools in the pipeline, especially in a voice-processing skill that may automatically handle external media.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
This function uploads local audio files to a third-party API using the configured bearer token, but there is no explicit consent prompt, privacy disclosure, or indication to callers that biometric voice data leaves the local environment. Because voice samples are sensitive personal data and may enable identification or cloning misuse, silent transmission creates a real privacy and compliance risk even if the feature is expected by the skill's purpose.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
This prompt-audio upload sends another local recording to the external service without any built-in warning or consent mechanism. In the context of a voice-cloning skill, prompt audio is especially sensitive because it is short, targeted voice-reference material that can directly improve impersonation quality, so undisclosed transmission increases privacy and misuse risk.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The cloning request transmits preview text, file identifiers, and cloning options to a remote API, again without any explicit disclosure in code or interface. While sending this data is functionally necessary for the skill, preview text may contain sensitive or proprietary content, and in a voice-cloning workflow the overall operation has elevated abuse potential because it helps create reusable synthetic identities.

Unpinned Dependencies

Low
Category
Supply Chain
Content
# MiniMax Voice Maker Skill Dependencies
requests>=2.28.0
websockets>=10.0
ffmpeg-python>=0.2.0
Confidence
95% confidence
Finding
requests>=2.28.0

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal