Minimax Tts

Data & APIs

MiniMax Text-to-Speech synthesis using the HTTP REST API. Generates high-quality audio from text in 40+ languages with ultra-realistic voices. Use when the user wants to convert text to speech, create voiceovers, generate narrated audio content, or use MiniMax TTS voices. Supports streaming and non-streaming modes, multiple audio formats (mp3, wav, pcm), and voice effects. Triggered by: text to speech, TTS, text to audio, MiniMax TTS, generate voice, voiceover, read this aloud, text to voice

Install

openclaw skills install minimax-tts-pro

MiniMax TTS

MiniMax Text-to-Speech via HTTP REST API. Supports streaming and non-streaming, 40+ languages, 200+ voices.

API Details

  • Endpoint: POST https://api.minimax.io/v1/t2a_v2
  • Alt endpoint (lower latency): POST https://api-uw.minimax.io/v1/t2a_v2
  • Auth: Bearer token via MINIMAX_API_KEY env var
  • Content-Type: application/json

Quick Usage

uv run python scripts/tts.py --text "Hello world" --voice English_expressive_narrator --model speech-2.8-hd --output hello.mp3

Scripts

  • scripts/tts.py — Core TTS script. Run with --help for full options.

Models

ModelDescription
speech-2.8-hdUltra-realistic, supports sound tags
speech-2.8-turboFast + natural flow
speech-2.6-hdLow latency, enhanced naturalness
speech-2.6-turboFast, affordable
speech-02-hdSuperior rhythm, high similarity
speech-02-turboSuperior rhythm, multilingual

Output Formats

  • mp3 (default), wav, pcm
  • Sample rates: 32000 (default), 16000, 24000, 48000
  • Bitrate: 128000 (default), 64000, 32000

Languages

40+ languages including: English, Chinese (Mandarin/Cantonese), Japanese, Korean, Spanish, French, German, Portuguese, Arabic, Russian, Hindi, Thai, Vietnamese, Turkish, Dutch, Polish, Italian, Indonesian, Malay, Persian, Swedish, Norwegian, Danish, Finnish, Hebrew, Romanian, Greek, Czech, Hungarian, Tamil, Afrikaans, and more.

Voices

Key English voices:

  • English_expressive_narrator — Default expressive narrator
  • English_radiant_girl — Radiant female
  • English_magnetic_voiced_man — Magnetic male voice
  • English_Aussie_Bloke — Australian male
  • English_Whispering_girl — Whispering female
  • English_PlayfulGirl — Playful female
  • English_Comedian — Comedic voice
  • English_AnimeCharacter — Female anime narrator

For full voice list (200+ voices across all languages), see references/voices.md.

Sound Tags (speech-2.8-hd only)

Use XML-like tags for breathing, pauses, expression:

  • (sighs) — breathing sound
  • (laughs) — laughter
  • (coughs) — coughing
  • [laughs] — laughing
  • ... or (pause:500) — pause in ms
  • <emphasis>important</emphasis> — emphasis
  • <spell-out>A-P-I</spell-out> — spell out letters

Script Usage

uv run python scripts/tts.py --text "Your text here" [options]

Options:
  --text TEXT              Text to synthesize (required)
  --model MODEL           Model: speech-2.8-hd (default), speech-2.8-turbo, speech-2.6-hd, etc.
  --voice VOICE_ID        Voice ID (default: English_expressive_narrator)
  --speed SPEED           Speed 0.5-2.0 (default: 1.0)
  --pitch PITCH           Pitch -3 to 3 (default: 0)
  --vol VOLUME            Volume 0-10 (default: 1)
  --language_boost LANG   Language boost: auto (default), or specific lang e.g. en, zh
  --output_format FORMAT  hex (default) or raw (mp3/wav bytes returned directly)
  --format AUDIO_FORMAT   mp3 (default), wav, pcm
  --sample_rate RATE      32000 (default), 16000, 24000, 48000
  --bitrate BITRATE       128000 (default), 64000, 32000
  --stream                Enable streaming mode (returns chunks as they generate)
  --output FILE           Output file path (default: minimax_tts_output.mp3)
  --api_url URL           Override API URL
  --api_key KEY           Override API key (reads MINIMAX_API_KEY env if not set)

Streaming Mode

uv run python scripts/tts.py --text "Hello, streaming audio." --stream --output stream_output.mp3

Examples

# Basic
uv run python scripts/tts.py --text "The quick brown fox jumps over the lazy dog."

# Different voice
uv run python scripts/tts.py --text "Bonjour le monde" --voice French_Standard_Female --model speech-2.6-turbo

# Streaming
uv run python scripts/tts.py --text "This is streaming audio" --stream --output streaming.mp3

# With sound tags (expressive)
uv run python scripts/tts.py --text "Hello(sighs)... what a beautiful day(laughs)!" --voice English_expressive_narrator --model speech-2.8-hd