Install
openclaw skills install local-voiceLocal text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.
openclaw skills install local-voiceSub-second local voice AI for Apple Silicon Macs using FluidAudio's CoreML models.
brew install espeak-ng
cd /path/to/skill/sources
swift build -c release
mkdir -p ~/clawd/bin
cp .build/release/StellaVoice ~/clawd/bin/
cp -R .build/arm64-apple-macosx/release/ESpeakNG.framework ~/clawd/bin/
install_name_tool -add_rpath @executable_path ~/clawd/bin/StellaVoice
cat > ~/Library/LaunchAgents/com.stella.tts.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.stella.tts</string>
<key>ProgramArguments</key>
<array>
<string>$HOME/clawd/bin/StellaVoice</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>$HOME/.clawdbot/logs/stella-tts.log</string>
<key>StandardErrorPath</key>
<string>$HOME/.clawdbot/logs/stella-tts.err.log</string>
</dict>
</plist>
EOF
launchctl load ~/Library/LaunchAgents/com.stella.tts.plist
The daemon listens on http://127.0.0.1:18790:
# Simple text to WAV
curl -X POST http://127.0.0.1:18790/synthesize -d "Hello world" -o output.wav
# With speed control (0.5-2.0)
curl -X POST "http://127.0.0.1:18790/synthesize?speed=1.2" -d "Fast!" -o output.wav
# JSON endpoint
curl -X POST http://127.0.0.1:18790/synthesize/json \
-H "Content-Type: application/json" \
-d '{"text": "Hello", "speed": 1.0, "deEss": true}'
curl -X POST http://127.0.0.1:18790/transcribe \
--data-binary @audio.wav \
-H "Content-Type: audio/wav"
# Returns: {"text": "transcribed text"}
curl http://127.0.0.1:18790/health
# Returns: ok
Default voice is af_sky. Change by modifying the source code.
Top Kokoro voices (American female):
af_heart (A grade) - warm, naturalaf_bella (A-) - expressiveaf_sky (C-) - clear, lightAll 54 voices: See references/VOICES.md
speed=0.8 → Calm, relaxedspeed=1.0 → Natural pacespeed=1.2 → Energetic, upbeat! → Excited tone? → Rising intonation. → Neutral, falling... → Pauses<phoneme ph="kəkˈɔɹO">Kokoro</phoneme>
<sub alias="Doctor">Dr.</sub>
<say-as interpret-as="date">2024-01-15</say-as>
See scripts/stella-tts.sh for a convenient wrapper:
scripts/stella-tts.sh "Hello world" output.wav
scripts/stella-tts.sh "Hello world" output.mp3 # Auto-converts
For voice assistants, update your voice proxy to use local endpoints:
// STT
const response = await fetch('http://127.0.0.1:18790/transcribe', {
method: 'POST',
headers: { 'Content-Type': 'audio/wav' },
body: audioData
});
const { text } = await response.json();
// TTS
const audio = await fetch('http://127.0.0.1:18790/synthesize', {
method: 'POST',
body: textToSpeak
});
Library not loaded (ESpeakNG)
install_name_tool -add_rpath @executable_path /path/to/binarySlow first request
x86 vs ARM
uname -m (should show arm64)The daemon source is in sources/ directory. It's a Swift package using:
Rebuild after modifying:
cd sources && swift build -c release