Install
openclaw skills install elevenlabs-ttsElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp, ElevenLabs multilingual support. Generate realistic AI voices using ElevenLabs API.
openclaw skills install elevenlabs-ttsGenerate expressive voice messages using ElevenLabs v3 with audio tags.
ELEVENLABS_API_KEY): Required. Get one at elevenlabs.io → Profile → API Keys. Configure in openclaw.json under messages.tts.elevenlabs.apiKey.Storytelling (emotional journey):
[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!
Horror/Suspense (building dread):
[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!
Conversation with reactions:
[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.
Hebrew (romantic moment):
[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?
Spanish (celebration to reflection):
[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.
In openclaw.json, configure TTS under messages.tts:
{
"messages": {
"tts": {
"provider": "elevenlabs",
"elevenlabs": {
"apiKey": "sk_your_api_key_here",
"voiceId": "pNInz6obpgDQGcFmaJgB",
"modelId": "eleven_v3",
"languageCode": "en",
"voiceSettings": {
"stability": 0.5,
"similarityBoost": 0.75,
"style": 0,
"useSpeakerBoost": true,
"speed": 1
}
}
}
}
}
Getting your API Key:
These premade voices are optimized for v3 and work well with audio tags:
| Voice | ID | Gender | Accent | Best For |
|---|---|---|---|---|
| Adam | pNInz6obpgDQGcFmaJgB | Male | American | Deep narration, general use |
| Rachel | 21m00Tcm4TlvDq8ikWAM | Female | American | Calm narration, conversational |
| Brian | nPczCjzI2devNBz1zQrb | Male | American | Deep narration, podcasts |
| Charlotte | XB0fDUnXU5powFXDhCwa | Female | English-Swedish | Expressive, video games |
| George | JBFqnCBsd6RMkjVDRZzb | Male | British | Raspy narration, storytelling |
Finding more voices:
GET https://api.elevenlabs.io/v1/voicesVoice selection tips:
eleven_v3 (alpha) - ONLY model supporting audio tags| Mode | Stability | Description |
|---|---|---|
| Creative | 0.3-0.5 | More emotional/expressive, may hallucinate |
| Natural | 0.5-0.7 | Balanced, closest to original voice |
| Robust | 0.7-1.0 | Highly stable, less responsive to tags |
For audio tags, use Creative (0.5) or Natural. Higher stability reduces tag responsiveness.
Range: 0.7 (slow) to 1.2 (fast), default 1.0
Extreme values affect quality. For pacing, prefer audio tags like [rushed] or [drawn out].
How many tags to use:
Where to place tags:
Context matters:
[nervous] I... I'm not sure about this. What if it doesn't work? works better than [nervous] Hello.Combine tags for nuance:
[nervously][whispers] = nervous whispering[excited][laughs] = excited laughterRegenerate for best results:
Match tag to voice:
[shouts] on a whispering voice[whispers] on a loud/energetic voicev3 does NOT support SSML break tags. Use audio tags and punctuation instead.
Punctuation enhances audio tags:
[nervous] I... I don't know...[excited] That's AMAZING![explaining] So what you do is— [interrupting] Wait![nervous] Are you sure about this?[happy] We did it!Combine tags + punctuation for maximum effect:
[tired] It was a long day... [sighs] Nobody listens anymore.
tts tool (returns Opus in /tmp/openclaw/tts-*/)message tool1. Generate TTS (add [pause] at end to prevent cutoff):
tts text="[excited] This is amazing! [pause]" channel=whatsapp
2. Find the LATEST file (⚠️ CRITICAL - always use the newest file!):
find /tmp/openclaw/tts-* /tmp/tts-* -name "*.opus" -o -name "*.mp3" -o -name "*.ogg" 2>/dev/null | xargs ls -t | head -1
The tts tool now outputs to /tmp/openclaw/tts-*/ (NOT /tmp/tts-*/).
Old files may exist in /tmp/tts-*/ from previous sessions - never use those!
3. If file is MP3, convert to Opus:
ffmpeg -i /path/to/voice.mp3 -c:a libopus -b:a 64k -vbr on -application voip /path/to/voice.ogg
If already .opus, skip this step.
4. Copy to workspace and send:
cp /tmp/openclaw/tts-xxx/voice.opus ~/. openclaw/workspace/voice-temp.ogg
message action=send channel=whatsapp target="+972..." filePath="/root/.openclaw/workspace/voice-temp.ogg" asVoice=true message=" "
5. Cleanup:
rm /root/.openclaw/workspace/voice-temp.ogg
WhatsApp requires a non-empty message body to send voice notes. Use a single space as the message.
| Format | iOS | Android | Transcribe |
|---|---|---|---|
| MP3 | ✅ Works | ❌ May fail | ❌ No |
| Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |
Always convert to Opus - it's the only format that:
ElevenLabs sometimes cuts off the last word. Always add [pause] or ... at the end:
[excited] This is amazing! [pause]
For content >800 chars:
tts toolcat > list.txt << EOF
file '/path/file1.mp3'
file '/path/file2.mp3'
EOF
ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3
Important: Don't mention "part 2" or "chapter" - keep it seamless.
v3 can handle multiple characters in one generation:
Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!
Dialogue tags: [interrupting], [overlapping], [cuts in], [interjecting]
| Category | Tags | When to Use |
|---|---|---|
| Emotions | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section |
| Delivery | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes |
| Reactions | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly |
| Pacing | [pause], [hesitates], [stammers], [breathes] | Dramatic timing |
| Character | [French accent], [British accent], [robotic tone] | Character voice shifts |
| Dialogue | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |
Most effective tags (reliable results):
[excited], [nervous], [sad], [happy][laughs], [sighs], [whispers][pause]Less reliable (test and regenerate):
[explosion], [gunshot]Full tag list: See references/audio-tags.md
Tags read aloud?
eleven_v3 modelVoice inconsistent?
WhatsApp won't play?
No emotion despite tags?