KittenTTS WhatsApp

Other

Voice-to-voice mode for WhatsApp using KittenTTS + ffmpeg. Transcribe incoming audio with whisper, reply with a TTS voice note converted to WhatsApp-compatib...

Install

openclaw skills install kittentts-whatsapp

KittenTTS WhatsApp Voice

Generates WhatsApp-compatible voice notes from text using KittenTTS + ffmpeg. Specifically solves the format mismatch that causes silent failures: KittenTTS outputs 24kHz WAV → converted to 16kHz OGG Opus via ffmpeg → sent as WhatsApp voice note.

⚠️ Read before installing. This skill installs system packages and downloads large ML models. See Setup below.

System Dependencies

Dependency	Install command	Size	Notes
`ffmpeg`	`apt-get install -y ffmpeg`	~30MB	Available in most distro repos
`kittentts`	`pip3 install kittentts --break-system-packages`	pulls ~25-80MB from Hugging Face on first run	Python package
`libopus`	bundled with ffmpeg	—	OGG encoding support
`soundfile`	pulled by kittentts	—	Python package

Network Calls

First run: downloads TTS model (~25-80MB) from huggingface.co/KittenML based on model size chosen
No API keys required — fully offline capable after model download
Set HF_TOKEN env var to avoid unauthenticated rate limits on model download

Model Options

Model	Parameters	Size	Hugging Face ID
nano (int8)	15M	25MB	`KittenML/kitten-tts-nano-0.8-int8`
nano	15M	56MB	`KittenML/kitten-tts-nano-0.8-fp32`
micro	40M	41MB	`KittenML/kitten-tts-micro-0.8`
mini	80M	80MB	`KittenML/kitten-tts-mini-0.8`

Default: kitten-tts-mini-0.8 (best quality). Change in scripts/tts_walkie.sh.

Setup

Run these manually before the skill is used:

# 1. System package (requires root/privileged)
apt-get install -y ffmpeg

# 2. Python package
pip3 install kittentts --break-system-packages

# 3. Optional: set Hugging Face token to avoid rate limits
# echo 'export HF_TOKEN="hf_your_token_here"' >> ~/.bashrc

Restart OpenClaw after installing dependencies so the new packages are in PATH.

Usage

TTS only (no transcription)

bash scripts/tts_walkie.sh "Your message here" Bella
# Output: /tmp/walkie_reply.ogg (16kHz OGG Opus, WhatsApp-ready)

Transcription only (optional — requires whisper)

# Install whisper (one-time, ~140MB-1.4GB depending on model)
pip3 install whisper --break-system-packages

bash scripts/transcribe.sh /path/to/audio.ogg [model]
# Model: tiny | base | small | medium | large (default: base)

Voices

Available: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

Default: Bella

Security Notes

Audio files are written to a private /tmp/kittentts-walkie/ directory (mode 700) — only the running user can read them.
WAV intermediates are cleaned up immediately after conversion; only the OGG is kept for sending.
Set VOICE_SPEED env var to adjust speech rate (default: 1.0).

Files

kittentts-whatsapp/
├── SKILL.md
└── scripts/
    ├── tts_walkie.sh      # TTS + ffmpeg conversion (speed is now used)
    └── transcribe.sh       # whisper transcription (optional)

⚠️ Privileged Install Warning

The dependency install commands use --break-system-packages and apt-get install -y. These require root privileges and modify system packages. Review before running if you are on a managed system.

Troubleshooting

Audio sends but is silent or rejected by WhatsApp: → Run ffprobe -v quiet -print_format json -show_streams /tmp/walkie_reply.ogg → Must show codec_name: opus and sample_rate: 48000 (or 16000). If not, the ffmpeg chain failed.

TTS generation is slow: → Switch to a smaller model (nano instead of mini) in scripts/tts_walkie.sh.

Hugging Face download rate limit: → Set HF_TOKEN in your environment. Free accounts get lower rate limits.