Phone Voice Agent
Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 6 · 2.4k · 7 current installs · 7 all-time installs
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The code (server.py and server_realtime.py) clearly implements a Twilio inbound call bridge with Deepgram/OpenAI/ElevenLabs integration, which matches the skill description. However the registry metadata claims no required env vars or binaries while both SKILL.md/README and the code require multiple API keys and an external binary (ffmpeg). That mismatch between declared metadata and actual requirements is an incoherence that reduces trust.
Instruction Scope
Runtime instructions ask you to run a local FastAPI server and expose it (ngrok) which is appropriate. But the runtime code does additional things not emphasized in the registry: it records and saves call transcripts to disk under a calls/ directory, can make outbound calls via Twilio if creds are set, and contains an optional web_search function using a BRAVE_API_KEY. The skill will therefore store potentially sensitive PII locally and can send audio/text to multiple third-party APIs — these behaviors are within the described feature set but are privacy-sensitive and not fully documented in metadata.
Install Mechanism
There is no formal install spec in the registry (instruction-only), but the package contains a requirements.txt and README instructing pip install -r scripts/requirements.txt. That is normal, but the code also invokes ffmpeg via subprocess for audio conversion — ffmpeg is not declared in requirements or the SKILL metadata as a required binary. Pip installs packages from PyPI (moderate risk).
Credentials
The registry fields claimed 'none' for required env vars, yet SKILL.md/README and the code require multiple secrets: OPENAI_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER, and optional BRAVE_API_KEY and PUBLIC_URL. Requiring Twilio/OpenAI/Deepgram/ElevenLabs creds is proportionate to the stated purpose, but (a) the metadata omission is misleading, and (b) BRAVE_API_KEY is present in code but not documented as a prerequisite (unexpected network access if set). Also the code ships a default TWILIO_PHONE_NUMBER value hard-coded which is unusual and should be reviewed.
Persistence & Privilege
The skill does not request always:true nor attempt to modify other skills or global agent config. It persists recordings and JSON call results under its own calls/ directory and reads tasks/ YAML files — this is expected for a call-recording agent, but it does persist sensitive data locally.
What to consider before installing
What to check before installing/using this skill:
- Metadata mismatch: the registry claims no env vars or binaries, but the project needs many API keys (OpenAI, Deepgram, ElevenLabs, Twilio) and likely ffmpeg. Do not rely on the registry summary — follow README/SKILL.md and inspect code.
- Secrets: only provide API keys you are willing to expose to this codebase and the third-party services. Prefer test/sandbox accounts and rotate keys after testing.
- Hard-coded default: server.py contains a default TWILIO_PHONE_NUMBER (+18665515246). Verify why this default exists and remove/change it before use.
- Data persistence: the server saves transcripts/metadata to a calls/ directory. These can contain PII and voice transcripts — review where files are written and ensure appropriate storage/cleanup policies.
- External network calls: audio and transcripts are sent to Deepgram, ElevenLabs, and OpenAI; an optional Brave web-search API is implemented (BRAVE_API_KEY). If you enable BRAVE_API_KEY, the agent can perform web searches on behalf of calls. Be aware of all external endpoints.
- Binaries and environment: ffmpeg is invoked for audio conversion but isn't declared as a required binary — install and vet ffmpeg on your host. Install Python deps in an isolated virtualenv.
- Review code paths that perform subprocesses, streaming, and file writes (server.py: ffmpeg subprocess and streaming logic is complex and partly truncated in the provided excerpt). Ensure rate limits and error handling are acceptable for your environment.
If you plan to proceed: run this in an isolated environment (VM/container) with test API keys, read the code thoroughly (especially logging, save_call_result, and any webhook handlers), and do not expose production credentials or personal phone numbers until you've validated behavior.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Phone Agent Skill
Runs a local FastAPI server that acts as a real-time voice bridge.
Architecture
Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
|
+--> OpenAI (LLM)
+--> ElevenLabs (TTS)
Prerequisites
- Twilio Account: Phone number + TwiML App.
- Deepgram API Key: For fast speech-to-text.
- OpenAI API Key: For the conversation logic.
- ElevenLabs API Key: For realistic text-to-speech.
- Ngrok (or similar): To expose your local port 8080 to Twilio.
Setup
-
Install Dependencies:
pip install -r scripts/requirements.txt -
Set Environment Variables (in
~/.moltbot/.env,~/.clawdbot/.env, or export):export DEEPGRAM_API_KEY="your_key" export OPENAI_API_KEY="your_key" export ELEVENLABS_API_KEY="your_key" export TWILIO_ACCOUNT_SID="your_sid" export TWILIO_AUTH_TOKEN="your_token" export PORT=8080 -
Start the Server:
python3 scripts/server.py -
Expose to Internet:
ngrok http 8080 -
Configure Twilio:
- Go to your Phone Number settings.
- Set "Voice & Fax" -> "A Call Comes In" to Webhook.
- URL:
https://<your-ngrok-url>.ngrok.io/incoming - Method:
POST
Usage
Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.
Customization
- System Prompt: Edit
SYSTEM_PROMPTinscripts/server.pyto change the persona. - Voice: Change
ELEVENLABS_VOICE_IDto use different voices. - Model: Switch
gpt-4o-minitogpt-4for smarter (but slower) responses.
Files
7 totalSelect a file
Select a file to preview.
Comments
Loading comments…
