Clonev

Other

Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).

Install

openclaw skills install clonev

CloneV Skill - Voice Cloning Made Simple

⚠️ CRITICAL INSTRUCTIONS FOR AI MODELS

DO NOT try to use Docker containers directly. DO NOT try to interact with coqui-xtts container - it is broken and restarting. DO NOT try to use APIs or servers.

ONLY USE THE SCRIPT: scripts/clonev.sh

The script handles everything automatically. Just call it with text, voice sample, and language.

What This Skill Does

Clones any voice from a short audio sample and generates new speech in that voice.

Input:

Text to speak
Voice sample (WAV file, 6-30 seconds)
Language code

Output: OGG voice file (cloned voice speaking the text)

Works with: Any voice! Yours, a celebrity, a character, etc.

The ONE Command You Need

$(scripts/clonev.sh "Your text here" /path/to/voice_sample.wav language)

That's it! Nothing else needed.

Step-by-Step Usage (FOR AI MODELS)

Step 1: Get the required inputs

Text to speak (from user)
Path to voice sample WAV file (from user)
Language code (from user or default to en)

Step 2: Run the script

VOICE_FILE=$(scripts/clonev.sh "TEXT_HERE" "/path/to/sample.wav" LANGUAGE)

Step 3: Use the output

The variable $VOICE_FILE now contains the path to the generated OGG file.

Complete Working Examples

Example 1: Clone voice and send to Telegram

# Generate cloned voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Hello, this is my cloned voice!" "/mnt/c/TEMP/Recording 25.wav" en)

# Send to Telegram (as voice message)
message action=send channel=telegram asVoice=true filePath="$VOICE"

Example 2: Clone voice in Czech

# Generate Czech voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj, tohle je můj hlas" "/mnt/c/TEMP/Recording 25.wav" cs)

# Send
message action=send channel=telegram asVoice=true filePath="$VOICE"

Example 3: Full workflow with check

#!/bin/bash

# Generate voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en)

# Verify file was created
if [ -f "$VOICE" ]; then
    echo "Success! Voice file: $VOICE"
    ls -lh "$VOICE"
else
    echo "Error: Voice file not created"
fi

Common Language Codes

Code	Language	Example Usage
`en`	English	`scripts/clonev.sh "Hello" sample.wav en`
`cs`	Czech	`scripts/clonev.sh "Ahoj" sample.wav cs`
`de`	German	`scripts/clonev.sh "Hallo" sample.wav de`
`fr`	French	`scripts/clonev.sh "Bonjour" sample.wav fr`
`es`	Spanish	`scripts/clonev.sh "Hola" sample.wav es`

Full list: en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko

Voice Sample Requirements

Format: WAV file
Length: 6-30 seconds (optimal: 10-15 seconds)
Quality: Clear audio, no background noise
Content: Any speech (the actual words don't matter)

Good samples:

✅ Recording of someone speaking clearly
✅ No music or noise in background
✅ Consistent volume

Bad samples:

❌ Music or songs
❌ Heavy background noise
❌ Very short (< 6 seconds)
❌ Very long (> 30 seconds)

⚠️ Important Notes

Model Download

First use downloads ~1.87GB model (one-time)
Model is stored at: /mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/
Status: ✅ Already downloaded

Processing Time

Takes 20-40 seconds depending on text length
This is normal - voice cloning is computationally intensive

Troubleshooting

"Command not found"

Make sure you're in the skill directory or use full path:

/home/bernie/clawd/skills/clonev/scripts/clonev.sh "text" sample.wav en

"Voice sample not found"

Check the path to the WAV file
Use absolute paths (starting with /)
Ensure file exists: ls -la /path/to/sample.wav

"Model not found"

The model should auto-download. If not:

cd /mnt/c/TEMP/Docker-containers/coqui-tts
docker run --rm --entrypoint "" \
  -v $(pwd)/models-xtts:/root/.local/share/tts \
  ghcr.io/coqui-ai/tts:latest \
  python3 -c "from TTS.api import TTS; TTS('tts_models/multilingual/multi-dataset/xtts_v2')"

Poor voice quality

Use clearer voice sample
Ensure no background noise
Try different sample (some voices clone better)

Quick Reference Card (FOR AI MODELS)

USER: "Clone my voice and say 'hello'"
→ Get: sample path, text="hello", language="en"
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "hello" "/path/to/sample.wav" en)
→ Result: $VOICE contains path to OGG file
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"

USER: "Make me speak Czech"
→ Get: sample path, text="Ahoj", language="cs"  
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj" "/path/to/sample.wav" cs)
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"

Output Location

Generated files are saved to:

/mnt/c/TEMP/Docker-containers/coqui-tts/output/clonev_output.ogg

The script returns this path, so you can use it directly.

Summary

ONLY use the script: scripts/clonev.sh
NEVER try to use Docker containers directly
NEVER try to interact with the coqui-xtts container
Script handles everything automatically
Returns path to OGG file ready to send

Simple. Just use the script.

Clone any voice. Speak any language. Just use the script.