Xiaomi MiMo-V2-TTS

v1.0.0

Converts text to speech using Xiaomi MiMo-V2-TTS with support for emotional styles, Chinese dialects, role voices, and singing synthesis.

⭐ 0· 147·0 current·0 all-time

by@ddpie

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for ddpie/mimo-v2-tts.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Xiaomi MiMo-V2-TTS" (ddpie/mimo-v2-tts) from ClawHub.
Skill page: https://clawhub.ai/ddpie/mimo-v2-tts
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install mimo-v2-tts

ClawHub CLI

Package manager switcher

npx clawhub@latest install mimo-v2-tts

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name, SKILL.md, and script all implement Xiaomi MiMo V2 TTS and only require an API key and an output path, which is coherent with the stated purpose. However, registry metadata lists no required environment variables/primary credential while both SKILL.md and the script require a MIMO_API_KEY (or --api-key). This metadata mismatch is unexpected and should be corrected/clarified.

✓

Instruction Scope

Runtime instructions and the script stay within TTS scope: they call api.xiaomimimo.com, accept style/speed/format, and write audio to a local output path. The agent is not instructed to read unrelated files, system state, or transmit data to other endpoints.

ℹ

Install Mechanism

There is no formal install spec (instruction-only), but the script will attempt to pip-install the 'requests' package at runtime if missing. That triggers network downloads and execution of pip; it's common but increases runtime behavior compared with a pure instruction-only skill and should be noted.

Credentials

The only secret the script uses is the MiMo API key (MIMO_API_KEY), which is appropriate for an API-backed TTS. The concern is the registry metadata claiming 'no required env vars' while the SKILL.md/script require an API key — this inconsistency could mislead users into installing without providing credentials and reduces transparency.

✓

Persistence & Privilege

The skill does not request persistent/always-enabled privileges and does not modify other skills or global agent configuration. It runs on invocation and writes only the requested audio output file.

What to consider before installing

This skill appears to implement the advertised Xiaomi MiMo TTS and only needs a MiMo API key and the ability to write the chosen output file. Before installing: (1) be aware the registry metadata omits the required MIMO_API_KEY — you will need to provide that credential (env var or CLI). (2) The script will auto-install 'requests' with pip if missing (network download at runtime). If you trust the Xiaomi MiMo service and the package source (PyPI), this is reasonable; otherwise verify the API domain (api.xiaomimimo.com), consider running the script in a sandboxed environment, and avoid exposing broader credentials. If you need higher assurance, ask the publisher to fix the registry metadata and provide a signed release or vetted install instructions.

Like a lobster shell, security has layers — review code before you run it.

latestvk9711e2nd7589p4pcb13stwk8583fm72

147downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

Xiaomi MiMo-V2-TTS Skill

Text-to-speech using Xiaomi's MiMo-V2-TTS model. Supports emotional style control, Chinese dialects (Northeastern/Sichuan/Cantonese/Taiwanese), role-playing voices, and singing synthesis.

When to Use

User asks to convert text to speech / audio
User mentions "read aloud", "TTS", "voice synthesis", "narrate"
User wants specific voice styles, emotions, or dialects

API Details

Platform: https://platform.xiaomimimo.com
Base URL: https://api.xiaomimimo.com/v1
Endpoint: /v1/chat/completions (NOT /audio/speech)
Model: mimo-v2-tts
Auth: Bearer Token via MIMO_API_KEY env var

Important: API Format

MiMo TTS uses the Chat Completions endpoint with special requirements:

❌ No system role allowed (returns error)
✅ Must include assistant role message (the text to synthesize)
user message = style/voice instructions
assistant message = text to be spoken
Response: choices[0].message.audio.data contains base64-encoded audio

Usage

python3 <skill_dir>/scripts/mimo_tts.py \
  --text "Hello, world!" \
  --output /tmp/openclaw/tts_output.mp3 \
  [--style "cheerful tone"] \
  [--speed 1.0] \
  [--format mp3] \
  [--api-key YOUR_KEY]

Set MIMO_API_KEY environment variable or pass --api-key.

Parameters

Parameter	Required	Description
--text	✅	Text to synthesize (recommended < 5000 chars)
--output	✅	Output audio file path
--style	❌	Natural language style description
--speed	❌	Speech rate 0.5–2.0 (default 1.0)
--format	❌	mp3/wav/pcm/opus/flac (default mp3)
--api-key	❌	API Key (overrides env var)

Style Control Examples

Dialects: --style "speak in Cantonese" / "Sichuan dialect" / "Taiwanese accent"
Emotions: --style "happy and excited" / "sad and gentle" / "start happy then turn melancholic"
Characters: --style "news anchor" / "gentle older sister"
Singing: --style "sing it"
Combined: --style "Northeastern dialect, enthusiastic and bold"

Notes

Pricing: Free during launch period (March 2026), may charge later
Supports Chinese and English text
Best results with Chinese text and style descriptions in Chinese
Get API key at https://platform.xiaomimimo.com

Comments

Loading comments...