Install
openclaw skills install her-voiceGive your agent a voice. Use when the user wants the agent to speak, read aloud, or have voice responses.
openclaw skills install her-voiceGive your agent a voice. Audio responses powered by Kokoro TTS — a compact, naturally expressive model running entirely on-device.
Highly optimized response time thanks to on-the-fly audio streaming technology. 100% free, no API keys required. Inspired by Samantha and Sky.
python3 SKILL_DIR/scripts/setup.py
Note:
SKILL_DIRis the root directory of this skill — the agent resolves it automatically when running commands.
The setup wizard will:
espeak-ng (Homebrew on macOS, apt on Linux)~/.her-voice/config.jsonCheck status anytime:
python3 SKILL_DIR/scripts/setup.py status
After setup, configure the agent and user names:
python3 SKILL_DIR/scripts/config.py set agent_name "Jackie"
python3 SKILL_DIR/scripts/config.py set user_name "Matúš"
python3 SKILL_DIR/scripts/config.py set user_name_tts "Mah-toosh"
TTS pronunciation tip: If the user's name is non-English, figure out a phonetic English spelling that Kokoro will pronounce correctly. Store it in user_name_tts and use that spelling whenever speaking the name aloud. The real name stays in user_name for display purposes.
# Basic usage
python3 SKILL_DIR/scripts/speak.py "Hello, world!"
# Skip visualizer for this call
python3 SKILL_DIR/scripts/speak.py --no-viz "Quick note"
# Save to file instead of playing
python3 SKILL_DIR/scripts/speak.py --save /tmp/output.wav "Save this"
# Override voice or speed
python3 SKILL_DIR/scripts/speak.py --voice af_bella --speed 1.2 "Faster!"
# Pipe text from stdin
echo "Piped text" | python3 SKILL_DIR/scripts/speak.py
| Flag | Description |
|---|---|
--no-viz | Skip the visualizer for this call |
--persist | Keep visualizer open after playback ends |
--save PATH | Save audio to WAV file instead of playing |
--voice NAME | Override the configured voice |
--speed N | Override the configured speed multiplier |
--mode MODE | Override visualizer mode (v2 or classic) |
When the user wants voice responses:
afplay /System/Library/Sounds/Blow.aiff &
python3 SKILL_DIR/scripts/speak.py "Response text here"
The notification sound plays instantly (~0.1s) while TTS generates (~0.3-3s). This gives the user immediate feedback that the agent is responding.
Configure in ~/.her-voice/config.json:
{
"notification_sound": {
"enabled": true,
"sound": "Blow"
}
}
Available macOS sounds: Blow, Bottle, Frog, Funk, Glass, Hero, Morse, Ping, Pop, Purr, Sosumi, Submarine, Tink. Located in /System/Library/Sounds/.
The daemon keeps the Kokoro model warm in RAM, eliminating ~1.1s of startup overhead per call.
The daemon auto-resolves the mlx-audio venv — no need to find the venv Python manually.
# Start (persists in background)
nohup python3 SKILL_DIR/scripts/daemon.py start > /tmp/her-voice-daemon.log 2>&1 & disown
# Status
python3 SKILL_DIR/scripts/daemon.py status
# Stop
python3 SKILL_DIR/scripts/daemon.py stop
# Restart
python3 SKILL_DIR/scripts/daemon.py restart
speak.py auto-detects the daemon: uses it if available, falls back to direct model loading.
The daemon is optional. Without it, speech still works — just ~1s slower per call as the model loads each time. Skip the daemon to save ~2.3GB RAM.
Note: The daemon writes its PID file and socket after the model is fully loaded and ready to accept connections. They live in ~/.her-voice/ with restricted permissions (owner-only access). The daemon won't survive a reboot — start it again after restart if needed.
A floating overlay with three animated LED bars that react to speech in real-time. 60fps, native macOS (Cocoa + AVFoundation). macOS only — on other platforms, audio plays without the visualizer.
| Key | Action |
|---|---|
| ESC | Quit |
| Space | Pause/Resume (file mode) |
| ← → | Seek ±5s (file mode) |
| ⌘V | Paste text to speak (persist mode) |
Keep the visualizer on screen between playbacks. Use as a standalone voice station:
# Launch in persist mode (stays open, idle breathing animation)
~/.her-voice/bin/her-voice-viz --persist
# Stream mode + persist (stays open after speech ends)
python3 SKILL_DIR/scripts/speak.py --persist "Hello!"
In persist mode:
# Play a file with visualizer
~/.her-voice/bin/her-voice-viz --audio /path/to/file.wav
# Demo mode (simulated audio)
~/.her-voice/bin/her-voice-viz --demo
# Stream raw PCM
cat audio.raw | ~/.her-voice/bin/her-voice-viz --stream --sample-rate 24000
python3 SKILL_DIR/scripts/config.py set visualizer.enabled false
Config file: ~/.her-voice/config.json
# View all settings
python3 SKILL_DIR/scripts/config.py status
# Get a value
python3 SKILL_DIR/scripts/config.py get voice
# Set a value (dot notation for nested keys)
python3 SKILL_DIR/scripts/config.py set speed 1.1
python3 SKILL_DIR/scripts/config.py set visualizer.mode classic
| Key | Default | Description |
|---|---|---|
agent_name | "" | Agent's name (e.g. "Jackie") |
user_name | "" | User's real name |
user_name_tts | "" | Phonetic spelling for TTS (e.g. "Mah-toosh" for Matúš) |
voice | af_heart | Base voice name |
voice_blend | {af_heart: 0.6, af_sky: 0.4} | Voice blend weights |
speed | 1.05 | Speech speed multiplier |
language | en | Language code |
tts_engine | auto | TTS engine: auto, mlx, or pytorch |
model | mlx-community/Kokoro-82M-bf16 | Model identifier (MLX) |
visualizer.enabled | true | Show visualizer overlay |
visualizer.mode | v2 | Animation mode (v2/classic) |
visualizer.remember_position | true | Save window position between sessions |
notification_sound.enabled | true | Play sound before speaking |
notification_sound.sound | Blow | macOS system sound name |
daemon.auto_start | true | Advisory flag only — the daemon never self-starts. When true, the agent should start it on first voice use (saves ~1s/call, costs ~2.3GB RAM) |
daemon.socket_path | ~/.her-voice/tts.sock | Unix socket path |
Mix multiple voices for a unique sound. Configure voice_blend in config:
{
"voice_blend": {"af_heart": 0.6, "af_sky": 0.4}
}
The blended voice is stored as a .safetensors file in the model's voices directory (e.g., af_heart_60_af_sky_40.safetensors). Create it by running TTS once — speak.py looks for the pre-blended file automatically.
| Error | Cause | Fix |
|---|---|---|
| "mlx-audio not found" | Venv missing or broken | Run setup.py |
| "espeak-ng not found" | Phonemizer missing | brew install espeak-ng |
| Compilation failed | Xcode tools missing | xcode-select --install |
| "Model not found" | First run, no download | Run setup.py or speak once |
| Daemon "not running" | Crashed or rebooted | Start daemon again |
| No sound output | macOS audio permissions | Check System Settings → Sound → Output |
| Visualizer not showing | Binary not compiled | Run setup.py |
| "kokoro not found" | PyTorch venv missing | Run setup.py |
| PyTorch CUDA error | GPU driver mismatch | pip install torch --force-reinstall in kokoro venv |
| "soundfile not found" | Missing dependency | pip install soundfile in kokoro venv |
xcode-select --install)espeak-ng for phonemization (brew install espeak-ng on macOS, apt install espeak-ng on Linux)Remove all Her Voice data (config, venvs, compiled binary, daemon state):
python3 SKILL_DIR/scripts/daemon.py stop
rm -rf ~/.her-voice
tts_engine config option (auto, mlx, or pytorch)