Phone Voice Agent

Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 6 · 2.4k · 7 current installs · 7 all-time installs

by@kesslerio

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The code (server.py and server_realtime.py) clearly implements a Twilio inbound call bridge with Deepgram/OpenAI/ElevenLabs integration, which matches the skill description. However the registry metadata claims no required env vars or binaries while both SKILL.md/README and the code require multiple API keys and an external binary (ffmpeg). That mismatch between declared metadata and actual requirements is an incoherence that reduces trust.

Instruction Scope

Runtime instructions ask you to run a local FastAPI server and expose it (ngrok) which is appropriate. But the runtime code does additional things not emphasized in the registry: it records and saves call transcripts to disk under a calls/ directory, can make outbound calls via Twilio if creds are set, and contains an optional web_search function using a BRAVE_API_KEY. The skill will therefore store potentially sensitive PII locally and can send audio/text to multiple third-party APIs — these behaviors are within the described feature set but are privacy-sensitive and not fully documented in metadata.

ℹ

Install Mechanism

There is no formal install spec in the registry (instruction-only), but the package contains a requirements.txt and README instructing pip install -r scripts/requirements.txt. That is normal, but the code also invokes ffmpeg via subprocess for audio conversion — ffmpeg is not declared in requirements or the SKILL metadata as a required binary. Pip installs packages from PyPI (moderate risk).

Credentials

The registry fields claimed 'none' for required env vars, yet SKILL.md/README and the code require multiple secrets: OPENAI_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER, and optional BRAVE_API_KEY and PUBLIC_URL. Requiring Twilio/OpenAI/Deepgram/ElevenLabs creds is proportionate to the stated purpose, but (a) the metadata omission is misleading, and (b) BRAVE_API_KEY is present in code but not documented as a prerequisite (unexpected network access if set). Also the code ships a default TWILIO_PHONE_NUMBER value hard-coded which is unusual and should be reviewed.

✓

Persistence & Privilege

The skill does not request always:true nor attempt to modify other skills or global agent config. It persists recordings and JSON call results under its own calls/ directory and reads tasks/ YAML files — this is expected for a call-recording agent, but it does persist sensitive data locally.

What to consider before installing

What to check before installing/using this skill: - Metadata mismatch: the registry claims no env vars or binaries, but the project needs many API keys (OpenAI, Deepgram, ElevenLabs, Twilio) and likely ffmpeg. Do not rely on the registry summary — follow README/SKILL.md and inspect code. - Secrets: only provide API keys you are willing to expose to this codebase and the third-party services. Prefer test/sandbox accounts and rotate keys after testing. - Hard-coded default: server.py contains a default TWILIO_PHONE_NUMBER (+18665515246). Verify why this default exists and remove/change it before use. - Data persistence: the server saves transcripts/metadata to a calls/ directory. These can contain PII and voice transcripts — review where files are written and ensure appropriate storage/cleanup policies. - External network calls: audio and transcripts are sent to Deepgram, ElevenLabs, and OpenAI; an optional Brave web-search API is implemented (BRAVE_API_KEY). If you enable BRAVE_API_KEY, the agent can perform web searches on behalf of calls. Be aware of all external endpoints. - Binaries and environment: ffmpeg is invoked for audio conversion but isn't declared as a required binary — install and vet ffmpeg on your host. Install Python deps in an isolated virtualenv. - Review code paths that perform subprocesses, streaming, and file writes (server.py: ffmpeg subprocess and streaming logic is complex and partly truncated in the provided excerpt). Ensure rate limits and error handling are acceptable for your environment. If you plan to proceed: run this in an isolated environment (VM/container) with test API keys, read the code thoroughly (especially logging, save_call_result, and any webhook handlers), and do not expose production credentials or personal phone numbers until you've validated behavior.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk976y16xts2v1f692paxcd3bd5808s55

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Phone Agent Skill

Runs a local FastAPI server that acts as a real-time voice bridge.

Architecture

Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
                                                  |
                                                  +--> OpenAI (LLM)
                                                  +--> ElevenLabs (TTS)

Prerequisites

Twilio Account: Phone number + TwiML App.
Deepgram API Key: For fast speech-to-text.
OpenAI API Key: For the conversation logic.
ElevenLabs API Key: For realistic text-to-speech.
Ngrok (or similar): To expose your local port 8080 to Twilio.

Setup

Install Dependencies:

pip install -r scripts/requirements.txt

Set Environment Variables (in ~/.moltbot/.env, ~/.clawdbot/.env, or export):

export DEEPGRAM_API_KEY="your_key"
export OPENAI_API_KEY="your_key"
export ELEVENLABS_API_KEY="your_key"
export TWILIO_ACCOUNT_SID="your_sid"
export TWILIO_AUTH_TOKEN="your_token"
export PORT=8080

Start the Server:
```
python3 scripts/server.py
```
Expose to Internet:
```
ngrok http 8080
```
Configure Twilio:
- Go to your Phone Number settings.
- Set "Voice & Fax" -> "A Call Comes In" to Webhook.
- URL: https://<your-ngrok-url>.ngrok.io/incoming
- Method: POST

Usage

Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.

Customization

System Prompt: Edit SYSTEM_PROMPT in scripts/server.py to change the persona.
Voice: Change ELEVENLABS_VOICE_ID to use different voices.
Model: Switch gpt-4o-mini to gpt-4 for smarter (but slower) responses.

Files

7 total

Select a file

Select a file to preview.

Comments

Loading comments…