MiMo TTS — Speech Synthesis
First-Time Setup
If MIMO_API_KEY is not configured, the skill cannot function. Guide the user:
- Obtain an API key from Xiaomi MiMo (https://api.xiaomimimo.com).
- Set it via OpenClaw config:
openclaw config set skills.entries.mimo-tts.apiKey "your-api-key-here"
Or set the environment variable MIMO_API_KEY directly.
After configuring, the user should restart or start a new session.
Generate Speech
Use scripts/tts.py to synthesize text to audio:
python3 "{baseDir}/scripts/tts.py" "要合成的文本" -o output.wav
Options
| Flag | Default | Description |
|---|
-o | output.wav | Output file path |
-v | mimo_default | Voice: mimo_default, default_zh, default_en |
-s | (none) | Style tag, e.g. 开心, 东北话, 悄悄话, 孙悟空 |
-f | wav | Audio format |
--user-msg | (none) | Optional user-role context to adjust tone |
--api-key | env MIMO_API_KEY | API key override |
Examples
# Basic
python3 "{baseDir}/scripts/tts.py" "你好,今天天气真好" -o hello.wav
# Dialect style
python3 "{baseDir}/scripts/tts.py" "哎呀妈呀,这天儿也忒冷了吧" -s "东北话" -o dongbei.wav
# English voice
python3 "{baseDir}/scripts/tts.py" "Hello, how are you today?" -v default_en -o hello_en.wav
# Emotion + speed
python3 "{baseDir}/scripts/tts.py" "明天就是周五了,真开心!" -s "开心 变快" -o happy.wav
Style & Audio Tags
- Place
<style>style</style> at the beginning of text to set overall style.
- Use inline audio tags for fine control:
(紧张), (小声), (语速加快), (深呼吸), (苦笑), (沉默片刻).
- Multiple styles:
<style>开心 变快</style>text.
- Singing:
<style>唱歌</style>lyrics.
Voices
| Name | voice param |
|---|
| MiMo-默认 | mimo_default |
| MiMo-中文女声 | default_zh |
| MiMo-英文女声 | default_en |