Install
openclaw skills install kittentts-whatsappVoice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib...
openclaw skills install kittentts-whatsappGenerates WhatsApp-compatible voice notes from text using KittenTTS + ffmpeg. Specifically solves the format mismatch that causes silent failures: KittenTTS outputs 24kHz WAV → converted to 16kHz OGG Opus via ffmpeg → sent as WhatsApp voice note.
⚠️ Read before installing. This skill installs system packages and downloads large ML models. See Setup below.
| Dependency | Install command | Size | Notes |
|---|---|---|---|
ffmpeg | apt-get install -y ffmpeg | ~30MB | Available in most distro repos |
kittentts | pip3 install kittentts --break-system-packages | pulls ~25-80MB from Hugging Face on first run | Python package |
libopus | bundled with ffmpeg | — | OGG encoding support |
soundfile | pulled by kittentts | — | Python package |
huggingface.co/KittenML based on model size chosenHF_TOKEN env var to avoid unauthenticated rate limits on model download| Model | Parameters | Size | Hugging Face ID |
|---|---|---|---|
| nano (int8) | 15M | 25MB | KittenML/kitten-tts-nano-0.8-int8 |
| nano | 15M | 56MB | KittenML/kitten-tts-nano-0.8-fp32 |
| micro | 40M | 41MB | KittenML/kitten-tts-micro-0.8 |
| mini | 80M | 80MB | KittenML/kitten-tts-mini-0.8 |
Default: kitten-tts-mini-0.8 (best quality). Change in scripts/tts_walkie.sh.
Run these manually before the skill is used:
# 1. System package (requires root/privileged)
apt-get install -y ffmpeg
# 2. Python package
pip3 install kittentts --break-system-packages
# 3. Optional: set Hugging Face token to avoid rate limits
# echo 'export HF_TOKEN="hf_your_token_here"' >> ~/.bashrc
Restart OpenClaw after installing dependencies so the new packages are in PATH.
bash scripts/tts_walkie.sh "Your message here" Bella
# Output: /tmp/walkie_reply.ogg (16kHz OGG Opus, WhatsApp-ready)
# Install whisper (one-time, ~140MB-1.4GB depending on model)
pip3 install whisper --break-system-packages
bash scripts/transcribe.sh /path/to/audio.ogg [model]
# Model: tiny | base | small | medium | large (default: base)
Available: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo
Default: Bella
/tmp/kittentts-walkie/ directory (mode 700) — only the running user can read them.VOICE_SPEED env var to adjust speech rate (default: 1.0).kittentts-whatsapp/
├── SKILL.md
└── scripts/
├── tts_walkie.sh # TTS + ffmpeg conversion (speed is now used)
└── transcribe.sh # whisper transcription (optional)
The dependency install commands use --break-system-packages and apt-get install -y. These require root privileges and modify system packages. Review before running if you are on a managed system.
Audio sends but is silent or rejected by WhatsApp:
→ Run ffprobe -v quiet -print_format json -show_streams /tmp/walkie_reply.ogg
→ Must show codec_name: opus and sample_rate: 48000 (or 16000). If not, the ffmpeg chain failed.
TTS generation is slow:
→ Switch to a smaller model (nano instead of mini) in scripts/tts_walkie.sh.
Hugging Face download rate limit:
→ Set HF_TOKEN in your environment. Free accounts get lower rate limits.