Install
openclaw skills install @snarflakes/voice-bridgePush-to-talk voice input via Snarling hardware button and USB mic. Snarling records audio, plugin transcribes via OpenAI Whisper, then spawns a subagent that answers and sends the result to the Snarling display via send_notification.
openclaw skills install @snarflakes/voice-bridgePush-to-talk voice input via Snarling hardware button + USB mic → Whisper transcription → subagent.run() → send_notification to Snarling display
When a user presses the X button on a Snarling display, Snarling records 20 seconds of audio from the USB mic in its own thread, then POSTs the WAV file path to the plugin's /transcribe-and-reply endpoint. The voice bridge plugin transcribes via OpenAI Whisper (gpt-4o-mini-transcribe), then calls api.runtime.subagent.run() with the transcript. The subagent answers the question and sends the result to the Snarling display via the send_notification tool.
contracts.tools must be declared in openclaw.plugin.json manifest:
{ "contracts": { "tools": ["voice_record"] } }
hooks.allowConversationAccess: true must be in the plugin's config in openclaw.json — without this, the plugin loads lazily and its HTTP routes are invisible to the server:
{ "openclaw-voice-bridge": { "enabled": true, "config": {}, "hooks": { "allowConversationAccess": true } } }
Voice transcripts arrive as system events in the format:
🎤 Voice input: <transcribed text>
Treat these as direct user messages. Respond to the content naturally, as if the user said it in chat. Do not ignore them or treat them as informational background.
Always relay your response as a notification to Snarling so the user gets visual confirmation on the display. The subagent spawned by the voice bridge uses the send_notification tool:
api.runtime.subagent.run() with the transcriptsend_notification with the answer as the message and priority "normal"Example subagent behavior:
🎤 Voice input: What's the weather?send_notification(message: "🌤️ LA: Clear, 68°F", priority: "low")Keep notification messages under 80 characters (Snarling display limit). For longer responses, summarize the key point in the notification and give the full answer in chat.
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/transcribe-and-reply | POST | gateway token | Receives {wav_path}, transcribes, spawns subagent, delivers answer to Snarling display |
/audio-status | GET | gateway token | Returns {version, transcriptionModel, authAvailable} |
/start-listening | POST | gateway token | Deprecated (410) — use /transcribe-and-reply with wav_path |
curl -X POST http://localhost:18789/transcribe-and-reply \
-H "Authorization: Bearer <gateway-token>" \
-H "Content-Type: application/json" \
-d '{"wav_path": "/tmp/recording.wav"}'
| Button | Normal State | Approval State | Notification State |
|---|---|---|---|
| X | Voice input | — | — |
| Y | Sleep mode | — | — |
| B | No-op | Reject | Dismiss |
| A | — | Approve | Reveal |
The X button only triggers voice input when no approval or notification is active.
The v5 pipeline uses subagent.run() to create an isolated agent turn, which then uses send_notification to deliver the answer to the Snarling display:
arecord immediately in a background thread (~82ms latency)/transcribe-and-replygpt-4o-mini-transcribe (~2s)api.runtime.subagent.run() with the transcriptsend_notification to display the result on SnarlingPrevious approaches using enqueueSystemEvent + heartbeat wake were unreliable. The system event would enqueue and the heartbeat would report status=ran, but the event text never surfaced in the agent's context during the heartbeat turn (phantom heartbeat bug #86090). The subagent approach creates a real agent turn that can execute tools — specifically send_notification — to deliver answers directly to the display.
⚠️ Audio is sent to OpenAI for transcription. When you press X, the recorded audio (WAV file) is transmitted to OpenAI's Whisper API (api.openai.com/v1/audio/transcriptions) for speech-to-text conversion. OpenAI may retain transcribed text per their API data retention policy.
To avoid sending audio to OpenAI, you can use a local transcription model by changing the transcriptionModel config — but this requires a self-hosted Whisper endpoint.
~/.openclaw/extensions/openclaw-voice-bridge/index.mjs (hand-curated, NOT esbuild output)openclaw.plugin.json with micDevice, recordingDurationSec (default: 20), transcriptionModelDebug logging is opt-in and disabled by default. To enable:
export VOICE_BRIDGE_DEBUG=1
# Optional: override the log file path
export VOICE_BRIDGE_DEBUG_LOG=/tmp/voice-bridge-debug.log
When enabled, debug logs are written to VOICE_BRIDGE_DEBUG_LOG (default: /tmp/voice-bridge-debug.log). All potentially sensitive values (API keys, tokens, bearer strings) are automatically redacted.
WAV files are created at /tmp/voice_recording.wav.<timestamp>.wav during recording and deleted after transcription.
After code changes to the plugin, a full process restart is required (systemctl --user restart openclaw-gateway). SIGUSR1 hot-reload only reloads config, not plugin code.
⚠️ Do NOT esbuild rebuild index.mjs from index.ts — the esbuild output differs subtly from the hand-curated git version and breaks transcription. Apply patches surgically to index.mjs directly.