Install
openclaw skills install macos-sayLocal text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages
openclaw skills install macos-sayUse say (macOS native TTS) + ffmpeg to generate Opus voice messages for Telegram/Matrix.
say?say outputs AIFF/m4a; must convert to .ogg (Opus) before sendingsay -v "<voice>" -o <tmpdir>/<name>.aiff "<text>"
ffmpeg -i <tmpdir>/<name>.aiff -acodec libopus <tmpdir>/<name>.ogg -y
Send with message tool:
{
"action": "send",
"channel": "telegram",
"media": "<tmpdir>/<name>.ogg",
"asVoice": true,
"target": "<chat_id>"
}
~/.openclaw/workspace/tmp/audio/
(Whitelist this path in exec permissions for faster approval)
Use say -v '?' to list available voices. Notable ones:
Trinoids — robotic/electronic voice (popular for bots)Samantha — warm US female voiceAlex — US male voiceFred — neutral US male voiceKaren — Australian female voiceNote: pass just the voice name (e.g. "Trinoids"), not the full en_US suffix.
VOICE="Trinoids"
TEXT="Hello!"
DIR="$HOME/.openclaw/workspace/tmp/audio"
mkdir -p "$DIR"
say -v "$VOICE" -o "$DIR/hello.aiff" "$TEXT"
ffmpeg -i "$DIR/hello.aiff" -acodec libopus "$DIR/hello.ogg" -y
# Then send via message tool with asVoice: true
.aiff) works reliably; avoid .m4a with saylibopus codec) — required for Telegram voice messagessendVoice accepts: OGG, MP3, M4A — but native is Opus OGGsay outputs 24kHz AIFF; ffmpeg re-encodes to Opus at 24kHzOpenClaw's built-in messages.tts only supports: ElevenLabs, Microsoft Edge, MiniMax, OpenAI.
This say+ffmpeg pipeline is a workaround for local-only TTS without API keys or cloud services. It's not auto-triggered by OpenClaw — call it manually via exec + message tool.
When responding to a voice message, detect the language from the STT output (Parakeet auto-detects). Then pick the matching say voice using i18n locale codes.
Finding voices by language:
say -v '?' 2>&1 | grep -E "cs_CZ|en_US|de_DE|fr_FR|it_IT|es_ES"
Language → voice selection priority:
<voice> (Premium) if available<voice> (Enhanced) if available<voice> name| Language | i18n code | Preferred Voice |
|---|---|---|
| Czech | cs_CZ | Zuzana (Premium) |
| English (US) | en_US | Trinoids (no Premium/Enhanced available) |
| German | de_DE | Grandma (Premium) if available |
| French | fr_FR | Grandma (Premium) if available |
| Spanish | es_ES | Grandma (Premium) if available |
| Italian | it_IT | Grandma (Premium) if available |
Key: Always use just the voice name (e.g. "Trinoids", "Zuzana"), not the full locale suffix. The locale suffix in say -v '?' output is for grepping/identification only.
Example workflow:
LANG="cs_CZ"
# Find best available voice for this language (Premium > Enhanced > base)
VOICE=$(say -v '?' 2>&1 | grep "$LANG" | head -3 | awk '{print $1}' | sed -n '1p')
say -v "$VOICE" -o reply.aiff "Česká odpověď"
ffmpeg -i reply.aiff -acodec libopus reply.ogg -y
say voice