Minimax Tools

MCP Tools

Direct MiniMax API integration for speech synthesis (TTS), voice cloning, image generation, video generation, and music generation using local Python scripts instead of MCP. Use when you want reliable script-based MiniMax workflows inside OpenClaw for: (1) text-to-speech with built-in Chinese/English defaults or explicit voice IDs, (2) voice cloning with upload + preview flows, (3) text-to-image or reference-image generation, (4) text-to-video, image-to-video, or first/last-frame video generation with async polling/download, and (5) music generation from prompts and lyrics.

Install

openclaw skills install minimax-tools-skill

MiniMax Tools

Use this skill to call MiniMax multimodal APIs directly through local Python wrappers instead of relying on an external MCP server.

Overview

This skill currently supports:

Speech synthesis (TTS)
Voice cloning
Image generation
Video generation
Music generation

All wrappers are exposed through a single entrypoint script:

python3 scripts/minimax.py <subcommand> ...

Read references/api-notes.md only when you need endpoint details or parameter reminders.

Prerequisites

Expect these environment variables to be available before running the scripts:

MINIMAX_API_KEY

Optional:

MINIMAX_BASE_URL if you need to override the default API host

Python dependency:

requests

Routing guide

Use tts for speech synthesis
Use voice for uploading clone inputs, creating cloned voices, and optionally downloading preview audio
Use image for text-to-image or reference-image generation
Use video for text-to-video, image-to-video, or first/last-frame video workflows
Use music for song or instrumental generation

TTS defaults

Default model: speech-2.8-turbo
Default format: mp3
Default sample rate: 32000
Default bitrate: 128000
Default Chinese voice: Chinese (Mandarin)_Lyrical_Voice
Default English voice: English_Graceful_Lady
If --voice is omitted, the script uses --voice-lang zh|en and defaults to zh

Voice cloning notes

Clone source audio constraints:
- mp3, m4a, or wav
- 10 seconds to 5 minutes
- <= 20 MB
Optional prompt audio constraints:
- mp3, m4a, or wav
- under 8 seconds
- <= 20 MB
If cloning succeeds, the returned voice_id can be used immediately in TTS
MiniMax documentation notes cloned voices are temporary unless used in real TTS within 7 days

Video support

Supported modes:

text-to-video: video create
image-to-video: video i2v
first/last-frame video: video fl2v

Video creation is asynchronous. Use video query, video wait, and video download for task follow-up.

File handling rules

Prefer saving outputs locally and returning file paths
Local image inputs for image/video wrappers can be converted to Data URLs automatically
Prefer URL-based output when MiniMax returns temporary files, then download immediately
Avoid tight polling loops for async video jobs

Resources

scripts/minimax.py - unified CLI entrypoint
scripts/minimax_tts.py - TTS wrapper
scripts/minimax_voice.py - voice cloning wrapper
scripts/minimax_image.py - image generation wrapper
scripts/minimax_video.py - video generation wrapper
scripts/minimax_music.py - music generation wrapper
references/api-notes.md - focused API notes and constraints