VoiceClaw

v1.0.6

Local voice I/O for OpenClaw agents. Transcribe inbound audio/voice messages using local Whisper (whisper.cpp) and generate voice replies using local Piper T...

0· 613· 7 versions· 2 current· 3 all-time· Updated 9h ago· MIT-0
byAsif@asif2bd

Install

openclaw skills install voiceclaw

VoiceClaw

Local-only voice I/O for OpenClaw agents.

  • STT: transcribe.sh — converts audio to text via local Whisper binary
  • TTS: speak.sh — converts text to speech via local Piper binary
  • Network calls: none — both scripts run fully offline
  • No cloud APIs, no API keys required

Prerequisites

The following must be installed on the system before using this skill:

RequirementPurpose
whisper binarySpeech-to-text inference
ggml-base.en.bin model fileWhisper STT model
piper binaryText-to-speech synthesis
*.onnx voice model filesPiper TTS voices
ffmpegAudio format conversion

See README.md for installation and setup instructions.


Environment Variables

VariableDefaultPurpose
WHISPER_BINauto-detected via whichPath to whisper binary
WHISPER_MODEL~/.cache/whisper/ggml-base.en.binPath to Whisper model file
PIPER_BINauto-detected via whichPath to piper binary
VOICECLAW_VOICES_DIR~/.local/share/piper/voicesDirectory containing .onnx voice model files

Verify Setup

which whisper && echo "STT binary: OK"
which piper   && echo "TTS binary: OK"
which ffmpeg  && echo "ffmpeg: OK"
ls "${WHISPER_MODEL:-$HOME/.cache/whisper/ggml-base.en.bin}" && echo "STT model: OK"
ls "${VOICECLAW_VOICES_DIR:-$HOME/.local/share/piper/voices}"/*.onnx 2>/dev/null | head -1 && echo "TTS voices: OK"

Inbound Voice: Transcribe

# Transcribe audio → text (supports ogg, mp3, m4a, wav, flac)
TRANSCRIPT=$(bash scripts/transcribe.sh /path/to/audio.ogg)

Override model path:

WHISPER_MODEL=/path/to/ggml-base.en.bin bash scripts/transcribe.sh audio.ogg

Outbound Voice: Speak

# Step 1: Generate WAV (local Piper — no network)
WAV=$(bash scripts/speak.sh "Your response here." /tmp/reply.wav en_US-lessac-medium)

# Step 2: Convert to OGG Opus (Telegram voice requirement)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply.ogg -y -loglevel error

# Step 3: Send via message tool (filePath=/tmp/reply.ogg)

Override voice directory:

VOICECLAW_VOICES_DIR=/path/to/voices bash scripts/speak.sh "Hello." /tmp/reply.wav

Available Voices

VoiceStyle
en_US-lessac-mediumNeutral American (default)
en_US-amy-mediumWarm American female
en_US-joe-mediumAmerican male
en_US-kusal-mediumExpressive American male
en_US-danny-lowDeep American male (fast)
en_GB-alba-mediumBritish female
en_GB-northern_english_male-mediumNorthern British male

Agent Behavior Rules

  1. Voice in → Voice + Text out. Always respond with both a voice reply and a text reply when a voice message is received.
  2. Include the transcript. Show "🎙️ I heard: [transcript]" at the top of every text reply to a voice message.
  3. Keep voice responses concise. Piper TTS works best under ~200 words — summarize for audio, include full detail in text.
  4. Local only. Never use a cloud TTS/STT API. Only the local whisper and piper binaries.
  5. Send voice before text. Send the audio file first, then follow with the text reply.

Full Example

# 1. Transcribe inbound voice message
TRANSCRIPT=$(bash path/to/voiceclaw/scripts/transcribe.sh /path/to/voice.ogg)

# 2. Compose reply and generate audio
RESPONSE="Deployment complete. All checks passed."
WAV=$(bash path/to/voiceclaw/scripts/speak.sh "$RESPONSE" /tmp/reply_$$.wav)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply_$$.ogg -y -loglevel error

# 3. Send voice + text
# message(action=send, filePath=/tmp/reply_$$.ogg, ...)
# reply: "🎙️ I heard: $TRANSCRIPT\n\n$RESPONSE"

Troubleshooting

IssueFix
whisper: command not foundEnsure whisper binary is installed and in PATH
Whisper model not foundSet WHISPER_MODEL=/path/to/ggml-base.en.bin
piper: command not foundEnsure piper binary is installed and in PATH
Voice model missingSet VOICECLAW_VOICES_DIR=/path/to/voices/
OGG won't play on TelegramEnsure -c:a libopus flag in ffmpeg command

Version tags

latestvk975e1bmj2pchsgeydbv6hhq5581zgh6

Runtime requirements

Binswhisper, piper, ffmpeg
Environment variables
WHISPER_BINPath to whisper binary (default: auto-detected via which)
WHISPER_MODELPath to ggml-base.en.bin model file (default: ~/.cache/whisper/ggml-base.en.bin)
PIPER_BINPath to piper binary (default: auto-detected via which)
VOICECLAW_VOICES_DIRPath to directory containing .onnx voice model files (default: ~/.local/share/piper/voices)