Install
openclaw skills install gemini-live-phoneBridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression.
openclaw skills install gemini-live-phoneReal-time voice AI over phone calls using Google Gemini's native audio capabilities.
Phone ↔ Twilio ↔ WebSocket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM)
# Set required env vars
export GOOGLE_API_KEY="your-key"
export TWILIO_AUTH_TOKEN="your-token"
# Run the bridge
python scripts/bridge.py --port 3335
| Endpoint | Method | Description |
|---|---|---|
/gemini-live/status | GET | Health check + active calls |
/gemini-live/incoming | POST | TwiML for inbound calls (Twilio webhook) |
/gemini-live/stream | WS | Twilio Media Stream WebSocket |
/gemini-live/call | POST | Initiate outbound call |
/gemini-live/twiml | POST | TwiML for outbound calls |
/gemini-live/call-status | POST | Twilio call status webhook |
curl -X POST https://your-domain/gemini-live/call \
-H 'Content-Type: application/json' \
-d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}'
All settings via CLI args or environment variables:
--model — Gemini model (default: gemini-2.5-flash-native-audio-latest)--voice — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore)--from-number — Twilio outbound number (default: env TWILIO_FROM)--system-prompt — AI persona system prompt--max-duration — Max call seconds (default: 300)--vad-enabled / --no-vad — Toggle server-side VAD (default: on)--vad-silence-ms — Silence duration to trigger activityEnd (default: 500)--vad-energy-threshold — RMS energy threshold (default: 0.01)--vad-speech-min-ms — Min speech duration before activityStart (default: 100)--echo-multiplier — VAD threshold multiplier during agent speech (default: 3.0)--echo-decay-ms — Decay time after agent stops speaking (default: 300)https://your-domain/gemini-live/incoming (HTTP POST)https://your-domain/gemini-live/call-status (HTTP POST)The bridge must be accessible from the internet (Twilio connects to it). Recommended: Caddy reverse proxy with WebSocket support.
# Caddy config example
handle /gemini-live/* {
reverse_proxy localhost:3335 {
flush_interval -1
transport http {
read_timeout 0
write_timeout 0
}
}
}
Latency benchmarks (Gemini 2.5 Flash Native Audio):
| Config | Median | Min | Max |
|---|---|---|---|
| No VAD, 200ms buffer | 3,660ms | 2,360ms | 5,180ms |
| Server VAD, 50ms buffer | 2,500ms | 2,080ms | 6,980ms |
Server-side VAD reduces median latency by ~32%.