{"skill":{"slug":"elevenlabs-tts","displayName":"Elevenlabs Tts","summary":"ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp,...","description":"---\nname: elevenlabs-tts\ndescription: ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp, ElevenLabs multilingual support. Generate realistic AI voices using ElevenLabs API.\ntags: [elevenlabs, tts, voice, text-to-speech, audio, speech, whatsapp, multilingual, ai-voice]\nmetadata: {\"clawdbot\":{\"emoji\":\"🎙️\",\"requires\":{\"env\":[\"ELEVENLABS_API_KEY\"],\"system\":[\"ffmpeg\"],\"tools\":[\"ffmpeg\"]},\"primaryEnv\":\"ELEVENLABS_API_KEY\",\"source\":\"https://clawhub.com/skills/elevenlabs-tts\",\"version\":\"2.3.0\"}}\nallowed-tools: [exec, tts, message]\n---\n\n# ElevenLabs TTS (Text-to-Speech)\n\nGenerate expressive voice messages using ElevenLabs v3 with audio tags.\n\n## Prerequisites\n\n- **ElevenLabs API Key** (`ELEVENLABS_API_KEY`): Required. Get one at [elevenlabs.io](https://elevenlabs.io) → Profile → API Keys. Configure in `openclaw.json` under `messages.tts.elevenlabs.apiKey`.\n- **ffmpeg**: Required for audio format conversion (MP3 → Opus for WhatsApp compatibility). Must be installed and available on PATH.\n\n## Quick Start Examples\n\n**Storytelling (emotional journey):**\n```\n[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!\n```\n\n**Horror/Suspense (building dread):**\n```\n[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!\n```\n\n**Conversation with reactions:**\n```\n[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.\n```\n\n**Hebrew (romantic moment):**\n```\n[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?\n```\n\n**Spanish (celebration to reflection):**\n```\n[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.\n```\n\n## Configuration (OpenClaw)\n\nIn `openclaw.json`, configure TTS under `messages.tts`:\n\n```json\n{\n  \"messages\": {\n    \"tts\": {\n      \"provider\": \"elevenlabs\",\n      \"elevenlabs\": {\n        \"apiKey\": \"sk_your_api_key_here\",\n        \"voiceId\": \"pNInz6obpgDQGcFmaJgB\",\n        \"modelId\": \"eleven_v3\",\n        \"languageCode\": \"en\",\n        \"voiceSettings\": {\n          \"stability\": 0.5,\n          \"similarityBoost\": 0.75,\n          \"style\": 0,\n          \"useSpeakerBoost\": true,\n          \"speed\": 1\n        }\n      }\n    }\n  }\n}\n```\n\n**Getting your API Key:**\n1. Go to https://elevenlabs.io\n2. Sign up/login\n3. Click profile → API Keys\n4. Copy your key\n\n## Recommended Voices for v3\n\nThese premade voices are optimized for v3 and work well with audio tags:\n\n| Voice | ID | Gender | Accent | Best For |\n|-------|-----|--------|--------|----------|\n| **Adam** | `pNInz6obpgDQGcFmaJgB` | Male | American | Deep narration, general use |\n| **Rachel** | `21m00Tcm4TlvDq8ikWAM` | Female | American | Calm narration, conversational |\n| **Brian** | `nPczCjzI2devNBz1zQrb` | Male | American | Deep narration, podcasts |\n| **Charlotte** | `XB0fDUnXU5powFXDhCwa` | Female | English-Swedish | Expressive, video games |\n| **George** | `JBFqnCBsd6RMkjVDRZzb` | Male | British | Raspy narration, storytelling |\n\n**Finding more voices:**\n- Browse: https://elevenlabs.io/voice-library\n- v3-optimized collection: https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH\n- API: `GET https://api.elevenlabs.io/v1/voices`\n\n**Voice selection tips:**\n- Use IVC (Instant Voice Clone) or premade voices - PVC not optimized for v3 yet\n- Match voice character to your use case (whispering voice won't shout well)\n- For expressive IVCs, include varied emotional tones in training samples\n\n## Model Settings\n\n- **Model**: `eleven_v3` (alpha) - ONLY model supporting audio tags\n- **Languages**: 70+ supported with full audio tag control\n\n### Stability Modes\n\n| Mode | Stability | Description |\n|------|-----------|-------------|\n| **Creative** | 0.3-0.5 | More emotional/expressive, may hallucinate |\n| **Natural** | 0.5-0.7 | Balanced, closest to original voice |\n| **Robust** | 0.7-1.0 | Highly stable, less responsive to tags |\n\nFor audio tags, use **Creative** (0.5) or **Natural**. Higher stability reduces tag responsiveness.\n\n### Speed Control\n\nRange: 0.7 (slow) to 1.2 (fast), default 1.0\n\nExtreme values affect quality. For pacing, prefer audio tags like `[rushed]` or `[drawn out]`.\n\n## Critical Rules\n\n### Length Limits\n- **Optimal**: <800 characters per segment (best quality)\n- **Maximum**: 10,000 characters (API hard limit)\n- **Quality degrades** with longer text - voice becomes inconsistent\n\n### Audio Tags - Best Practices for Natural Sound\n\n**How many tags to use:**\n- 1-2 tags per sentence or phrase (not more!)\n- Tags persist until the next tag - no need to repeat\n- Overusing tags sounds unnatural and robotic\n\n**Where to place tags:**\n- At emotional transition points\n- Before key dramatic moments\n- When energy/pace changes\n\n**Context matters:**\n- Write text that *matches* the tag emotion\n- Longer text with context = better interpretation\n- Example: `[nervous] I... I'm not sure about this. What if it doesn't work?` works better than `[nervous] Hello.`\n\n**Combine tags for nuance:**\n- `[nervously][whispers]` = nervous whispering\n- `[excited][laughs]` = excited laughter\n- Keep combinations to 2 tags max\n\n**Regenerate for best results:**\n- v3 is non-deterministic - same text = different outputs\n- Generate 3+ versions, pick the best\n- Small text tweaks can improve results\n\n**Match tag to voice:**\n- Don't use `[shouts]` on a whispering voice\n- Don't use `[whispers]` on a loud/energetic voice\n- Test tags with your chosen voice\n\n### SSML Not Supported\nv3 does NOT support SSML break tags. Use audio tags and punctuation instead.\n\n### Punctuation Effects (use with tags!)\n\nPunctuation enhances audio tags:\n- **Ellipses (...)** → dramatic pauses: `[nervous] I... I don't know...`\n- **CAPS** → emphasis: `[excited] That's AMAZING!`\n- **Dashes (—)** → interruptions: `[explaining] So what you do is— [interrupting] Wait!`\n- **Question marks** → uncertainty: `[nervous] Are you sure about this?`\n- **Exclamation!** → energy boost: `[happy] We did it!`\n\nCombine tags + punctuation for maximum effect:\n```\n[tired] It was a long day... [sighs] Nobody listens anymore.\n```\n\n## WhatsApp Voice Messages\n\n### Complete Workflow\n\n1. **Generate** with `tts` tool (returns Opus in `/tmp/openclaw/tts-*/`)\n2. **Copy** to workspace (message tool only allows workspace paths)\n3. **Send** with `message` tool\n4. **Cleanup** - delete the workspace copy\n\n### Step-by-Step\n\n**1. Generate TTS (add [pause] at end to prevent cutoff):**\n```\ntts text=\"[excited] This is amazing! [pause]\" channel=whatsapp\n```\n\n**2. Find the LATEST file (⚠️ CRITICAL - always use the newest file!):**\n```bash\nfind /tmp/openclaw/tts-* /tmp/tts-* -name \"*.opus\" -o -name \"*.mp3\" -o -name \"*.ogg\" 2>/dev/null | xargs ls -t | head -1\n```\nThe `tts` tool now outputs to `/tmp/openclaw/tts-*/` (NOT `/tmp/tts-*/`).\nOld files may exist in `/tmp/tts-*/` from previous sessions - **never use those!**\n\n**3. If file is MP3, convert to Opus:**\n```bash\nffmpeg -i /path/to/voice.mp3 -c:a libopus -b:a 64k -vbr on -application voip /path/to/voice.ogg\n```\nIf already `.opus`, skip this step.\n\n**4. Copy to workspace and send:**\n```bash\ncp /tmp/openclaw/tts-xxx/voice.opus ~/. openclaw/workspace/voice-temp.ogg\n```\n```\nmessage action=send channel=whatsapp target=\"+972...\" filePath=\"/root/.openclaw/workspace/voice-temp.ogg\" asVoice=true message=\" \"\n```\n\n**5. Cleanup:**\n```bash\nrm /root/.openclaw/workspace/voice-temp.ogg\n```\n\nWhatsApp requires a non-empty message body to send voice notes. Use a single space as the message.\n\n### Why Opus?\n\n| Format | iOS | Android | Transcribe |\n|--------|-----|---------|------------|\n| MP3 | ✅ Works | ❌ May fail | ❌ No |\n| Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |\n\n**Always convert to Opus** - it's the only format that:\n- Works on all devices (iOS + Android)\n- Supports WhatsApp's transcribe button\n\n### Audio Cutoff Fix\n\nElevenLabs sometimes cuts off the last word. **Always add `[pause]` or `...` at the end:**\n```\n[excited] This is amazing! [pause]\n```\n\n## Long-Form Audio (Podcasts)\n\nFor content >800 chars:\n\n1. Split into short segments (<800 chars each)\n2. Generate each with `tts` tool\n3. Concatenate with ffmpeg:\n   ```bash\n   cat > list.txt << EOF\n   file '/path/file1.mp3'\n   file '/path/file2.mp3'\n   EOF\n   ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3\n   ```\n4. Convert to Opus for WhatsApp\n5. Send as single voice message\n\n**Important**: Don't mention \"part 2\" or \"chapter\" - keep it seamless.\n\n## Multi-Speaker Dialogue\n\nv3 can handle multiple characters in one generation:\n\n```\nJessica: [whispers] Did you hear that?\nChris: [interrupting] —I heard it too!\nJessica: [panicking] We need to hide!\n```\n\n**Dialogue tags**: `[interrupting]`, `[overlapping]`, `[cuts in]`, `[interjecting]`\n\n## Audio Tags Quick Reference\n\n| Category | Tags | When to Use |\n|----------|------|-------------|\n| **Emotions** | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section |\n| **Delivery** | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes |\n| **Reactions** | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly |\n| **Pacing** | [pause], [hesitates], [stammers], [breathes] | Dramatic timing |\n| **Character** | [French accent], [British accent], [robotic tone] | Character voice shifts |\n| **Dialogue** | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |\n\n**Most effective tags** (reliable results):\n- Emotions: `[excited]`, `[nervous]`, `[sad]`, `[happy]`\n- Reactions: `[laughs]`, `[sighs]`, `[whispers]`\n- Pacing: `[pause]`\n\n**Less reliable** (test and regenerate):\n- Sound effects: `[explosion]`, `[gunshot]`\n- Accents: results vary by voice\n\n**Full tag list**: See [references/audio-tags.md](references/audio-tags.md)\n\n## Troubleshooting\n\n**Tags read aloud?**\n- Verify using `eleven_v3` model\n- Use IVC/premade voices, not PVC\n- Simplify tags (no \"tone\" suffix)\n- Increase text length (250+ chars)\n\n**Voice inconsistent?**\n- Segment is too long - split at <800 chars\n- Regenerate (v3 is non-deterministic)\n- Try lower stability setting\n\n**WhatsApp won't play?**\n- Convert to Opus format (see above)\n\n**No emotion despite tags?**\n- Voice may not match tag style\n- Try Creative stability mode (0.5)\n- Add more context around the tag\n","tags":{"latest":"2.4.0","ai-voice":"2.1.0","audio":"2.1.0","elevenlabs":"2.1.0","elevenlabs-tts":"1.3.2","hebrew":"2.1.0","multilingual":"2.1.0","nikud":"2.1.0","openclaw":"1.3.2","podcast":"1.2.1","singing":"2.1.0","speech":"2.1.0","text-to-speech":"2.1.0","tts":"2.1.0","voice":"2.1.0","whatsapp":"2.1.0"},"stats":{"comments":3,"downloads":7306,"installsAllTime":34,"installsCurrent":34,"stars":6,"versions":24},"createdAt":1770140792654,"updatedAt":1778988092900},"latestVersion":{"version":"2.4.0","createdAt":1775035003031,"changelog":"Fix: Updated TTS file path from /tmp/tts-*/ to /tmp/openclaw/tts-*/. Added safety check to always use latest file. Added workspace copy step for message tool compatibility.","license":"MIT-0"},"metadata":{"setup":[{"key":"ELEVENLABS_API_KEY","required":true}],"os":null,"systems":null},"owner":{"handle":"shaharsha","userId":"s1731v0zz0xz2rh03mt12r81t983g5b0","displayName":"shaharsh","image":"https://avatars.githubusercontent.com/u/11346965?v=4"},"moderation":null}