Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Phone Voice Agent

Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
6 · 2.4k · 7 current installs · 7 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
The code (server.py and server_realtime.py) clearly implements a Twilio inbound call bridge with Deepgram/OpenAI/ElevenLabs integration, which matches the skill description. However the registry metadata claims no required env vars or binaries while both SKILL.md/README and the code require multiple API keys and an external binary (ffmpeg). That mismatch between declared metadata and actual requirements is an incoherence that reduces trust.
!
Instruction Scope
Runtime instructions ask you to run a local FastAPI server and expose it (ngrok) which is appropriate. But the runtime code does additional things not emphasized in the registry: it records and saves call transcripts to disk under a calls/ directory, can make outbound calls via Twilio if creds are set, and contains an optional web_search function using a BRAVE_API_KEY. The skill will therefore store potentially sensitive PII locally and can send audio/text to multiple third-party APIs — these behaviors are within the described feature set but are privacy-sensitive and not fully documented in metadata.
Install Mechanism
There is no formal install spec in the registry (instruction-only), but the package contains a requirements.txt and README instructing pip install -r scripts/requirements.txt. That is normal, but the code also invokes ffmpeg via subprocess for audio conversion — ffmpeg is not declared in requirements or the SKILL metadata as a required binary. Pip installs packages from PyPI (moderate risk).
!
Credentials
The registry fields claimed 'none' for required env vars, yet SKILL.md/README and the code require multiple secrets: OPENAI_API_KEY, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER, and optional BRAVE_API_KEY and PUBLIC_URL. Requiring Twilio/OpenAI/Deepgram/ElevenLabs creds is proportionate to the stated purpose, but (a) the metadata omission is misleading, and (b) BRAVE_API_KEY is present in code but not documented as a prerequisite (unexpected network access if set). Also the code ships a default TWILIO_PHONE_NUMBER value hard-coded which is unusual and should be reviewed.
Persistence & Privilege
The skill does not request always:true nor attempt to modify other skills or global agent config. It persists recordings and JSON call results under its own calls/ directory and reads tasks/ YAML files — this is expected for a call-recording agent, but it does persist sensitive data locally.
What to consider before installing
What to check before installing/using this skill: - Metadata mismatch: the registry claims no env vars or binaries, but the project needs many API keys (OpenAI, Deepgram, ElevenLabs, Twilio) and likely ffmpeg. Do not rely on the registry summary — follow README/SKILL.md and inspect code. - Secrets: only provide API keys you are willing to expose to this codebase and the third-party services. Prefer test/sandbox accounts and rotate keys after testing. - Hard-coded default: server.py contains a default TWILIO_PHONE_NUMBER (+18665515246). Verify why this default exists and remove/change it before use. - Data persistence: the server saves transcripts/metadata to a calls/ directory. These can contain PII and voice transcripts — review where files are written and ensure appropriate storage/cleanup policies. - External network calls: audio and transcripts are sent to Deepgram, ElevenLabs, and OpenAI; an optional Brave web-search API is implemented (BRAVE_API_KEY). If you enable BRAVE_API_KEY, the agent can perform web searches on behalf of calls. Be aware of all external endpoints. - Binaries and environment: ffmpeg is invoked for audio conversion but isn't declared as a required binary — install and vet ffmpeg on your host. Install Python deps in an isolated virtualenv. - Review code paths that perform subprocesses, streaming, and file writes (server.py: ffmpeg subprocess and streaming logic is complex and partly truncated in the provided excerpt). Ensure rate limits and error handling are acceptable for your environment. If you plan to proceed: run this in an isolated environment (VM/container) with test API keys, read the code thoroughly (especially logging, save_call_result, and any webhook handlers), and do not expose production credentials or personal phone numbers until you've validated behavior.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk976y16xts2v1f692paxcd3bd5808s55

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Phone Agent Skill

Runs a local FastAPI server that acts as a real-time voice bridge.

Architecture

Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
                                                  |
                                                  +--> OpenAI (LLM)
                                                  +--> ElevenLabs (TTS)

Prerequisites

  1. Twilio Account: Phone number + TwiML App.
  2. Deepgram API Key: For fast speech-to-text.
  3. OpenAI API Key: For the conversation logic.
  4. ElevenLabs API Key: For realistic text-to-speech.
  5. Ngrok (or similar): To expose your local port 8080 to Twilio.

Setup

  1. Install Dependencies:

    pip install -r scripts/requirements.txt
    
  2. Set Environment Variables (in ~/.moltbot/.env, ~/.clawdbot/.env, or export):

    export DEEPGRAM_API_KEY="your_key"
    export OPENAI_API_KEY="your_key"
    export ELEVENLABS_API_KEY="your_key"
    export TWILIO_ACCOUNT_SID="your_sid"
    export TWILIO_AUTH_TOKEN="your_token"
    export PORT=8080
    
  3. Start the Server:

    python3 scripts/server.py
    
  4. Expose to Internet:

    ngrok http 8080
    
  5. Configure Twilio:

    • Go to your Phone Number settings.
    • Set "Voice & Fax" -> "A Call Comes In" to Webhook.
    • URL: https://<your-ngrok-url>.ngrok.io/incoming
    • Method: POST

Usage

Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.

Customization

  • System Prompt: Edit SYSTEM_PROMPT in scripts/server.py to change the persona.
  • Voice: Change ELEVENLABS_VOICE_ID to use different voices.
  • Model: Switch gpt-4o-mini to gpt-4 for smarter (but slower) responses.

Files

7 total
Select a file
Select a file to preview.

Comments

Loading comments…