Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Elevenlabs Tts

v2.4.0

ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp,...

6· 6k·32 current·32 all-time
byshaharsh@shaharsha

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for shaharsha/elevenlabs-tts.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Elevenlabs Tts" (shaharsha/elevenlabs-tts) from ClawHub.
Skill page: https://clawhub.ai/shaharsha/elevenlabs-tts
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: ELEVENLABS_API_KEY
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install shaharsha/elevenlabs-tts

ClawHub CLI

Package manager switcher

npx clawhub@latest install elevenlabs-tts
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name/description (ElevenLabs TTS) matches the instructions: calling ElevenLabs v3, using audio tags, selecting voices, and converting audio for WhatsApp. The declared primary credential (ELEVENLABS_API_KEY) is appropriate for this purpose.
Instruction Scope
SKILL.md is instruction-only and stays within TTS scope: it instructs use of ElevenLabs API, audio-tag formatting, and use of ffmpeg for format conversion. It references storing the API key in openclaw.json (configuration file) and requires ffmpeg on PATH. It does not instruct reading unrelated system files or exfiltrating data. However, the skill allows the 'exec' tool (so the agent can run ffmpeg or other commands) — expected for audio conversion but worth noting.
Install Mechanism
No install spec and no code files — lowest risk category. Nothing is downloaded or written to disk by an installer. The skill is instruction-only and relies on existing system tools (ffmpeg).
Credentials
Only ELEVENLABS_API_KEY is required as the primary credential, which is proportionate. Minor inconsistency: top-level registry metadata listed 'Required binaries: none' but SKILL.md metadata and prerequisites require ffmpeg; ensure ffmpeg availability before use. No other unrelated secrets are requested.
Persistence & Privilege
always is false and the skill is user-invokable with normal autonomous invocation allowed. It does not request permanent presence or system-wide config changes. This is the standard, expected privilege model for a TTS skill.
Assessment
This skill appears to do what it says: generate ElevenLabs v3 TTS with audio tags and convert audio for WhatsApp. Before installing: 1) Verify provenance — the skill has no homepage and the source is 'unknown' (registry owner ID is opaque); prefer skills with a known publisher. 2) Confirm ffmpeg is installed on the agent host (SKILL.md requires ffmpeg, but the top-level registry entry omitted that). 3) Supply an ElevenLabs API key dedicated to this use (don't reuse high-privilege keys) and store it using your platform's secret store rather than in plain files if possible. 4) Be aware the agent is allowed to run exec commands — this is necessary for audio conversion but means it can run binaries on your host; restrict the environment where you run the skill if you are cautious. 5) If you need stronger assurance, ask the publisher for a canonical source/homepage or a signed package; absence of a verifiable source lowers confidence.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎙️ Clawdis
EnvELEVENLABS_API_KEY
Primary envELEVENLABS_API_KEY
ai-voicevk97956ax8hs2vaym8v8hgmzrqd80ts70audiovk97956ax8hs2vaym8v8hgmzrqd80ts70elevenlabsvk97956ax8hs2vaym8v8hgmzrqd80ts70elevenlabs-ttsvk979c8pe8p2rt0bfvt23yed6f180h19chebrewvk97956ax8hs2vaym8v8hgmzrqd80ts70latestvk97cnnyr1ptzqq0ytv8hy283ks84065kmultilingualvk97956ax8hs2vaym8v8hgmzrqd80ts70nikudvk97956ax8hs2vaym8v8hgmzrqd80ts70openclawvk979c8pe8p2rt0bfvt23yed6f180h19cpodcastvk972m2309rhsxe9b64p9e76nds80e5wjsingingvk97956ax8hs2vaym8v8hgmzrqd80ts70speechvk97956ax8hs2vaym8v8hgmzrqd80ts70text-to-speechvk97956ax8hs2vaym8v8hgmzrqd80ts70ttsvk97956ax8hs2vaym8v8hgmzrqd80ts70voicevk97956ax8hs2vaym8v8hgmzrqd80ts70whatsappvk97956ax8hs2vaym8v8hgmzrqd80ts70
6kdownloads
6stars
24versions
Updated 8h ago
v2.4.0
MIT-0

ElevenLabs TTS (Text-to-Speech)

Generate expressive voice messages using ElevenLabs v3 with audio tags.

Prerequisites

  • ElevenLabs API Key (ELEVENLABS_API_KEY): Required. Get one at elevenlabs.io → Profile → API Keys. Configure in openclaw.json under messages.tts.elevenlabs.apiKey.
  • ffmpeg: Required for audio format conversion (MP3 → Opus for WhatsApp compatibility). Must be installed and available on PATH.

Quick Start Examples

Storytelling (emotional journey):

[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!

Horror/Suspense (building dread):

[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!

Conversation with reactions:

[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.

Hebrew (romantic moment):

[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?

Spanish (celebration to reflection):

[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.

Configuration (OpenClaw)

In openclaw.json, configure TTS under messages.tts:

{
  "messages": {
    "tts": {
      "provider": "elevenlabs",
      "elevenlabs": {
        "apiKey": "sk_your_api_key_here",
        "voiceId": "pNInz6obpgDQGcFmaJgB",
        "modelId": "eleven_v3",
        "languageCode": "en",
        "voiceSettings": {
          "stability": 0.5,
          "similarityBoost": 0.75,
          "style": 0,
          "useSpeakerBoost": true,
          "speed": 1
        }
      }
    }
  }
}

Getting your API Key:

  1. Go to https://elevenlabs.io
  2. Sign up/login
  3. Click profile → API Keys
  4. Copy your key

Recommended Voices for v3

These premade voices are optimized for v3 and work well with audio tags:

VoiceIDGenderAccentBest For
AdampNInz6obpgDQGcFmaJgBMaleAmericanDeep narration, general use
Rachel21m00Tcm4TlvDq8ikWAMFemaleAmericanCalm narration, conversational
BriannPczCjzI2devNBz1zQrbMaleAmericanDeep narration, podcasts
CharlotteXB0fDUnXU5powFXDhCwaFemaleEnglish-SwedishExpressive, video games
GeorgeJBFqnCBsd6RMkjVDRZzbMaleBritishRaspy narration, storytelling

Finding more voices:

Voice selection tips:

  • Use IVC (Instant Voice Clone) or premade voices - PVC not optimized for v3 yet
  • Match voice character to your use case (whispering voice won't shout well)
  • For expressive IVCs, include varied emotional tones in training samples

Model Settings

  • Model: eleven_v3 (alpha) - ONLY model supporting audio tags
  • Languages: 70+ supported with full audio tag control

Stability Modes

ModeStabilityDescription
Creative0.3-0.5More emotional/expressive, may hallucinate
Natural0.5-0.7Balanced, closest to original voice
Robust0.7-1.0Highly stable, less responsive to tags

For audio tags, use Creative (0.5) or Natural. Higher stability reduces tag responsiveness.

Speed Control

Range: 0.7 (slow) to 1.2 (fast), default 1.0

Extreme values affect quality. For pacing, prefer audio tags like [rushed] or [drawn out].

Critical Rules

Length Limits

  • Optimal: <800 characters per segment (best quality)
  • Maximum: 10,000 characters (API hard limit)
  • Quality degrades with longer text - voice becomes inconsistent

Audio Tags - Best Practices for Natural Sound

How many tags to use:

  • 1-2 tags per sentence or phrase (not more!)
  • Tags persist until the next tag - no need to repeat
  • Overusing tags sounds unnatural and robotic

Where to place tags:

  • At emotional transition points
  • Before key dramatic moments
  • When energy/pace changes

Context matters:

  • Write text that matches the tag emotion
  • Longer text with context = better interpretation
  • Example: [nervous] I... I'm not sure about this. What if it doesn't work? works better than [nervous] Hello.

Combine tags for nuance:

  • [nervously][whispers] = nervous whispering
  • [excited][laughs] = excited laughter
  • Keep combinations to 2 tags max

Regenerate for best results:

  • v3 is non-deterministic - same text = different outputs
  • Generate 3+ versions, pick the best
  • Small text tweaks can improve results

Match tag to voice:

  • Don't use [shouts] on a whispering voice
  • Don't use [whispers] on a loud/energetic voice
  • Test tags with your chosen voice

SSML Not Supported

v3 does NOT support SSML break tags. Use audio tags and punctuation instead.

Punctuation Effects (use with tags!)

Punctuation enhances audio tags:

  • Ellipses (...) → dramatic pauses: [nervous] I... I don't know...
  • CAPS → emphasis: [excited] That's AMAZING!
  • Dashes (—) → interruptions: [explaining] So what you do is— [interrupting] Wait!
  • Question marks → uncertainty: [nervous] Are you sure about this?
  • Exclamation! → energy boost: [happy] We did it!

Combine tags + punctuation for maximum effect:

[tired] It was a long day... [sighs] Nobody listens anymore.

WhatsApp Voice Messages

Complete Workflow

  1. Generate with tts tool (returns Opus in /tmp/openclaw/tts-*/)
  2. Copy to workspace (message tool only allows workspace paths)
  3. Send with message tool
  4. Cleanup - delete the workspace copy

Step-by-Step

1. Generate TTS (add [pause] at end to prevent cutoff):

tts text="[excited] This is amazing! [pause]" channel=whatsapp

2. Find the LATEST file (⚠️ CRITICAL - always use the newest file!):

find /tmp/openclaw/tts-* /tmp/tts-* -name "*.opus" -o -name "*.mp3" -o -name "*.ogg" 2>/dev/null | xargs ls -t | head -1

The tts tool now outputs to /tmp/openclaw/tts-*/ (NOT /tmp/tts-*/). Old files may exist in /tmp/tts-*/ from previous sessions - never use those!

3. If file is MP3, convert to Opus:

ffmpeg -i /path/to/voice.mp3 -c:a libopus -b:a 64k -vbr on -application voip /path/to/voice.ogg

If already .opus, skip this step.

4. Copy to workspace and send:

cp /tmp/openclaw/tts-xxx/voice.opus ~/. openclaw/workspace/voice-temp.ogg
message action=send channel=whatsapp target="+972..." filePath="/root/.openclaw/workspace/voice-temp.ogg" asVoice=true message=" "

5. Cleanup:

rm /root/.openclaw/workspace/voice-temp.ogg

WhatsApp requires a non-empty message body to send voice notes. Use a single space as the message.

Why Opus?

FormatiOSAndroidTranscribe
MP3✅ Works❌ May fail❌ No
Opus (.ogg)✅ Works✅ Works✅ Yes

Always convert to Opus - it's the only format that:

  • Works on all devices (iOS + Android)
  • Supports WhatsApp's transcribe button

Audio Cutoff Fix

ElevenLabs sometimes cuts off the last word. Always add [pause] or ... at the end:

[excited] This is amazing! [pause]

Long-Form Audio (Podcasts)

For content >800 chars:

  1. Split into short segments (<800 chars each)
  2. Generate each with tts tool
  3. Concatenate with ffmpeg:
    cat > list.txt << EOF
    file '/path/file1.mp3'
    file '/path/file2.mp3'
    EOF
    ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3
    
  4. Convert to Opus for WhatsApp
  5. Send as single voice message

Important: Don't mention "part 2" or "chapter" - keep it seamless.

Multi-Speaker Dialogue

v3 can handle multiple characters in one generation:

Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!

Dialogue tags: [interrupting], [overlapping], [cuts in], [interjecting]

Audio Tags Quick Reference

CategoryTagsWhen to Use
Emotions[excited], [happy], [sad], [angry], [nervous], [curious]Main emotional state - use 1 per section
Delivery[whispers], [shouts], [soft], [rushed], [drawn out]Volume/speed changes
Reactions[laughs], [sighs], [gasps], [clears throat], [gulps]Natural human moments - sprinkle sparingly
Pacing[pause], [hesitates], [stammers], [breathes]Dramatic timing
Character[French accent], [British accent], [robotic tone]Character voice shifts
Dialogue[interrupting], [overlapping], [cuts in]Multi-speaker conversations

Most effective tags (reliable results):

  • Emotions: [excited], [nervous], [sad], [happy]
  • Reactions: [laughs], [sighs], [whispers]
  • Pacing: [pause]

Less reliable (test and regenerate):

  • Sound effects: [explosion], [gunshot]
  • Accents: results vary by voice

Full tag list: See references/audio-tags.md

Troubleshooting

Tags read aloud?

  • Verify using eleven_v3 model
  • Use IVC/premade voices, not PVC
  • Simplify tags (no "tone" suffix)
  • Increase text length (250+ chars)

Voice inconsistent?

  • Segment is too long - split at <800 chars
  • Regenerate (v3 is non-deterministic)
  • Try lower stability setting

WhatsApp won't play?

  • Convert to Opus format (see above)

No emotion despite tags?

  • Voice may not match tag style
  • Try Creative stability mode (0.5)
  • Add more context around the tag

Comments

Loading comments...