Install
openclaw skills install edge-tts-voice-systemLocal voice system for OpenClaw using faster-whisper for inbound transcription and Edge TTS for outbound replies. Use when you need private voice workflows, offline transcription, cached reply audio, or voice message handling after OpenClaw updates.
openclaw skills install edge-tts-voice-systemA complete, privacy-focused voice system for OpenClaw that works entirely offline. No internet required, no data leaves your machine.
# Install the skill
clawhub install lessac_offline_voice_system
# Or manually from this directory
./scripts/install.sh
from scripts.voice_handler import VoiceHandler
handler = VoiceHandler()
# Transcribe audio to text
text = handler.audio_to_text("voice_message.ogg")
print(f"You said: {text}")
# Generate voice response
audio_file = handler.text_to_audio("Hello, this is a voice response.")
# Transcribe audio
./scripts/voice_integration.sh transcribe voice_message.ogg
# Generate TTS
./scripts/voice_integration.sh tts "Hello world" output.wav
# Full voice processing
./scripts/voice_integration.sh process voice_message.ogg
en-IE-ConnorNeural)edge-tts)When installed, the skill can be configured to automatically:
The built-in OpenClaw reply TTS path is not the local voice pipeline used by this skill.
This skill now uses a local Edge TTS reply path instead, with cached output
stored under /root/.openclaw/tts/cache.
Default outbound voice:
en-IE-ConnorNeuralRelevant files:
tts_edge_wrapper.pyvoice_handler.pyvoice_integration.shscripts/install.shIf you need to change the voice, set:
export OPENCLAW_EDGE_TTS_VOICE="en-IE-ConnorNeural"
or replace it with another Edge-supported voice.
After an OpenClaw system update, rerun the installer to restore the voice stack:
cd /root/.openclaw/workspace/skills/lessac_offline_voice_system
./scripts/install.sh
This refreshes:
faster-whisper, edge-tts, soundfile)/root/.openclaw/tts/config.json# In your OpenClaw agent or custom script
import sys
sys.path.append("/path/to/skill/scripts")
from voice_handler import VoiceHandler
class YourAgent:
def __init__(self):
self.voice = VoiceHandler()
def handle_voice_message(self, audio_file):
# Transcribe
text = self.voice.audio_to_text(audio_file)
# Generate response (your AI logic here)
response = self.generate_response(text)
# Convert to voice
voice_response = self.voice.text_to_audio(response)
return voice_response
The skill uses Edge TTS by default. To use a different voice:
OPENCLAW_EDGE_TTS_VOICE to a supported Edge voiceChange the faster-whisper model size in scripts/voice_handler.py:
"tiny": Fastest, lower accuracy"base": Default, good balance"small": Higher accuracy, slower"medium": Best accuracy, slowest"No module named 'piper'"
pip install piper-tts
"ffmpeg not found"
sudo apt-get install ffmpeg
Out of memory with large models
"tiny" or "base" STT modelSlow TTS generation
Enable debug output:
export VOICE_DEBUG=1
./scripts/voice_integration.sh process audio.ogg
scripts/install.sh - Installation scriptscripts/voice_handler.py - Main Python handlerscripts/piper_tts.py - Edge TTS wrapperscripts/voice_integration.sh - Bash interfacereferences/voice_models.md - Voice model informationassets/ - Voice model files (downloaded during install)Open source. See included LICENSE file.
For issues or questions: