Local Voice Agent
v1.0.2Complete offline voice-to-voice AI assistant for OpenClaw (Whisper.cpp STT + Pocket-TTS). 100% local processing, no cloud APIs, no costs. Use for hands-free...
Like a lobster shell, security has layers — review code before you run it.
License
Runtime requirements
SKILL.md
Voice Agent - OpenClaw Skill
Complete voice-to-voice AI assistant for hands-free operation.
Architecture
User Voice → Whisper STT → Text → OpenClaw AI → Text → Pocket-TTS → Voice Response
Prerequisites
1. Whisper.cpp (Speech-to-Text)
# Clone and build
git clone https://github.com/ggerganov/whisper.cpp ~/.local/whisper.cpp
cd ~/.local/whisper.cpp
make -j4
# Download tiny model (fast, low-resource)
bash ./models/download-ggml-model.sh tiny
Test:
./build/bin/whisper-cli -m models/ggml-tiny.bin -f samples/jfk.wav
2. Pocket-TTS (Text-to-Speech)
Option A: Use existing server
export POCKET_TTS_URL="http://localhost:5000"
Option B: Install locally
# Clone your Pocket-TTS server
cd /path/to/pockettts
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 -m app.main --host 0.0.0.0 --port 5000
3. FFmpeg (Audio Conversion)
sudo apt-get install -y ffmpeg
Quick Start
Voice Command (One-shot)
# Record → Transcribe → Process → Speak
./bin/voice-agent "What's the weather today?"
Interactive Mode
# Continuous voice conversation
./bin/voice-agent --interactive
Voice File Processing
# Transcribe existing audio file
./bin/voice-to-text recording.wav
# Generate voice from text
./bin/text-to-voice "Hello world!" output.wav
Configuration
Edit config/voices.yaml:
# Default voices
stt:
model: tiny # tiny, small, medium (larger = more accurate, slower)
language: en # en, ne, hi, etc.
tts:
url: http://localhost:5000
voice: peter voice # Your custom voice
format: wav # wav, mp3
# Performance
performance:
threads: 4 # CPU threads for Whisper
realtime: true # Faster-than-realtime processing
API Endpoints
POST /v1/voice/command
Voice command processing:
curl -X POST "http://localhost:5000/v1/voice/command" \
-F "audio=@recording.wav" \
-F "action=openclaw"
Response:
{
"transcription": "What's the weather today?",
"response_text": "The weather in Kathmandu is partly cloudy, 22 degrees Celsius.",
"audio_response": "/tmp/response.wav"
}
GET /v1/voices
List available TTS voices:
curl http://localhost:5000/v1/voices
Use Cases
1. Daily Briefings (Voice)
./bin/voice-agent "Give me my morning briefing"
2. Voice Notes
./bin/voice-agent "Remind me to call Peter at 3 PM"
3. Hands-Free Coding
./bin/voice-agent "Show me the status of my git repository"
4. Accessibility
Perfect for users who prefer voice interaction or have mobility constraints.
Scripts
bin/voice-to-text
Convert speech to text:
./bin/voice-to-text input.wav
./bin/voice-to-text input.ogg # Auto-converts with ffmpeg
./bin/voice-to-text input.mp4 # Extracts audio from video
bin/text-to-voice
Convert text to speech:
./bin/text-to-voice "Hello world!" output.wav
./bin/text-to-voice --voice "usha lama" "Namaste!" greeting.wav
bin/voice-agent
Full voice pipeline:
./bin/voice-agent "What time is it?"
./bin/voice-agent --interactive # Conversation mode
./bin/voice-agent --file recording.wav # Process file
Troubleshooting
Whisper.cpp Errors
"failed to read audio file"
- Convert to WAV first:
ffmpeg -i input.ogg -ar 16000 -ac 1 output.wav
"model not found"
- Download model:
bash models/download-ggml-model.sh tiny
Pocket-TTS Errors
"Connection refused"
- Start TTS server:
python3 -m app.main - Check URL:
export POCKET_TTS_URL="http://localhost:5000"
"Voice not found"
- List voices:
curl http://localhost:5000/v1/voices - Clone custom voice if needed
Performance Issues
Slow transcription
- Use smaller model:
tinyinstead ofsmall - Reduce audio sample rate:
ffmpeg -i input.wav -ar 16000 output.wav
Slow TTS
- Use shorter text
- Generate in background
Examples
See examples/ directory for:
morning-briefing.sh- Automated voice briefingvoice-reminder.sh- Voice-based remindersconversation-mode.sh- Interactive voice chat
Performance
| Model | RAM | Speed (1 min audio) | Accuracy |
|---|---|---|---|
| tiny | 500MB | ~30 sec | ~90% |
| small | 1GB | ~60 sec | ~95% |
| medium | 2GB | ~120 sec | ~98% |
Recommendation: Start with tiny, upgrade to small if needed.
License
MIT License - See LICENSE file
Credits
- Whisper.cpp by Georgi Gerganov (ggerganov/whisper.cpp)
- Pocket-TTS by Kyutai Labs (kyutai-labs/pocket-tts)
- OpenClaw by OpenClaw Team (openclaw/openclaw)
Support
- GitHub Issues: [Your Repo Link]
- OpenClaw Discord: https://discord.com/invite/clawd
- Documentation: [Your Docs Link]
Files
12 totalComments
Loading comments…
