Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Gemini Live Phone

v1.0.1

Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression.

0· 300· 3 versions· 0 current· 0 all-time· Updated 21h ago· MIT-0
byABFS Tech@quantdeveloperusa

Gemini Live Phone Bridge

Real-time voice AI over phone calls using Google Gemini's native audio capabilities.

Architecture

Phone ↔ Twilio ↔ WebSocket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM)

Quick Start

# Set required env vars
export GOOGLE_API_KEY="your-key"
export TWILIO_AUTH_TOKEN="your-token"

# Run the bridge
python scripts/bridge.py --port 3335

Endpoints

EndpointMethodDescription
/gemini-live/statusGETHealth check + active calls
/gemini-live/incomingPOSTTwiML for inbound calls (Twilio webhook)
/gemini-live/streamWSTwilio Media Stream WebSocket
/gemini-live/callPOSTInitiate outbound call
/gemini-live/twimlPOSTTwiML for outbound calls
/gemini-live/call-statusPOSTTwilio call status webhook

Outbound Call API

curl -X POST https://your-domain/gemini-live/call \
  -H 'Content-Type: application/json' \
  -d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}'

Configuration

All settings via CLI args or environment variables:

Core

  • --model — Gemini model (default: gemini-2.5-flash-native-audio-latest)
  • --voice — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore)
  • --from-number — Twilio outbound number (default: env TWILIO_FROM)
  • --system-prompt — AI persona system prompt
  • --max-duration — Max call seconds (default: 300)

VAD (Voice Activity Detection)

  • --vad-enabled / --no-vad — Toggle server-side VAD (default: on)
  • --vad-silence-ms — Silence duration to trigger activityEnd (default: 500)
  • --vad-energy-threshold — RMS energy threshold (default: 0.01)
  • --vad-speech-min-ms — Min speech duration before activityStart (default: 100)

Echo Suppression

  • --echo-multiplier — VAD threshold multiplier during agent speech (default: 3.0)
  • --echo-decay-ms — Decay time after agent stops speaking (default: 300)

Twilio Setup

  1. Buy a phone number on Twilio
  2. Set Voice webhook: https://your-domain/gemini-live/incoming (HTTP POST)
  3. Set Call status URL: https://your-domain/gemini-live/call-status (HTTP POST)
  4. Ensure geo-permissions are enabled for target countries

Network Requirements

The bridge must be accessible from the internet (Twilio connects to it). Recommended: Caddy reverse proxy with WebSocket support.

# Caddy config example
handle /gemini-live/* {
    reverse_proxy localhost:3335 {
        flush_interval -1
        transport http {
            read_timeout 0
            write_timeout 0
        }
    }
}

Performance

Latency benchmarks (Gemini 2.5 Flash Native Audio):

ConfigMedianMinMax
No VAD, 200ms buffer3,660ms2,360ms5,180ms
Server VAD, 50ms buffer2,500ms2,080ms6,980ms

Server-side VAD reduces median latency by ~32%.

Version tags

latestvk9750w32ept8knvgennejt7j0h82v9hn

Runtime requirements

📞 Clawdis
Binspython3, uvicorn
EnvGOOGLE_API_KEY, TWILIO_AUTH_TOKEN
Primary envGOOGLE_API_KEY