Minimax Tools

v0.1.0

Direct MiniMax API integration for speech synthesis (TTS), voice cloning, image generation, video generation, and music generation using local Python scripts...

1· 209· 1 versions· 0 current· 0 all-time· Updated 12h ago· MIT-0

byYangtao Chen@cytwyatt

Security Scans

VirusTotalBenign ClawScanBenign Static analysisBenign

Install

openclaw skills install minimax-tools-skill

MiniMax Tools

Use this skill to call MiniMax multimodal APIs directly through local Python wrappers instead of relying on an external MCP server.

Overview

This skill currently supports:

Speech synthesis (TTS)
Voice cloning
Image generation
Video generation
Music generation

All wrappers are exposed through a single entrypoint script:

python3 scripts/minimax.py <subcommand> ...

Read references/api-notes.md only when you need endpoint details or parameter reminders.

Prerequisites

Expect these environment variables to be available before running the scripts:

MINIMAX_API_KEY

Optional:

MINIMAX_BASE_URL if you need to override the default API host

Python dependency:

requests

Routing guide

Use tts for speech synthesis
Use voice for uploading clone inputs, creating cloned voices, and optionally downloading preview audio
Use image for text-to-image or reference-image generation
Use video for text-to-video, image-to-video, or first/last-frame video workflows
Use music for song or instrumental generation

TTS defaults

Default model: speech-2.8-turbo
Default format: mp3
Default sample rate: 32000
Default bitrate: 128000
Default Chinese voice: Chinese (Mandarin)_Lyrical_Voice
Default English voice: English_Graceful_Lady
If --voice is omitted, the script uses --voice-lang zh|en and defaults to zh

Voice cloning notes

Clone source audio constraints:
- mp3, m4a, or wav
- 10 seconds to 5 minutes
- <= 20 MB
Optional prompt audio constraints:
- mp3, m4a, or wav
- under 8 seconds
- <= 20 MB
If cloning succeeds, the returned voice_id can be used immediately in TTS
MiniMax documentation notes cloned voices are temporary unless used in real TTS within 7 days

Video support

Supported modes:

text-to-video: video create
image-to-video: video i2v
first/last-frame video: video fl2v

Video creation is asynchronous. Use video query, video wait, and video download for task follow-up.

File handling rules

Prefer saving outputs locally and returning file paths
Local image inputs for image/video wrappers can be converted to Data URLs automatically
Prefer URL-based output when MiniMax returns temporary files, then download immediately
Avoid tight polling loops for async video jobs

Resources

scripts/minimax.py - unified CLI entrypoint
scripts/minimax_tts.py - TTS wrapper
scripts/minimax_voice.py - voice cloning wrapper
scripts/minimax_image.py - image generation wrapper
scripts/minimax_video.py - video generation wrapper
scripts/minimax_music.py - music generation wrapper
references/api-notes.md - focused API notes and constraints

Version tags

latestvk971w1yzhhewxgxb7ekkpf73kx836vx3

Runtime requirements

Binspython3

EnvMINIMAX_API_KEY

Primary envMINIMAX_API_KEY