Security audit

Openclaw Voice Gpt Realtime

Security checks across malware telemetry and agentic risk

Overview

This is a coherent real phone-calling skill, but it deserves review because it can dial arbitrary real numbers, incur charges, and logs sensitive call prompts without a clear per-call confirmation boundary.

Review before installing if you will use this with real customers, sensitive calls, or shared logging infrastructure. Set Twilio spend controls, avoid debug mode unless necessary, keep inbound calls disabled or allowlisted, and use an operator confirmation workflow before dialing real numbers. Treat local call logs, recordings, transcripts, and console logs as sensitive data.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (13)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 88% confidence
Finding: The skill documentation indicates access to environment-backed secrets and outbound network capabilities, but no explicit permissions are declared. In a skill that can place phone calls, use Twilio/OpenAI credentials, and expose a webhook, missing permission declarations reduce transparency and can prevent users or the platform from understanding the real trust boundary. This is especially risky because the capability set can incur charges and interact with external parties.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The stated purpose focuses on outbound calling, but the skill also appears to support inbound calling, call inspection, local persistence of metadata, and optional recording/transcript storage. That broader behavior materially changes the privacy and security profile: users may authorize a calling skill without realizing it can receive calls, retain interaction data, or expose inspection interfaces. In this context, the mismatch is dangerous because it involves real-world communications, sensitive transcripts, and potential third-party contact.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The code logs call metadata, user-supplied systemPrompt excerpts, and the full final system prompt to stdout. In a voice-calling context, those prompts can contain sensitive call objectives, personal data, operational instructions, or business logic, and console logs are often aggregated into centralized logging systems where many operators or services can access them.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: When debug is enabled, the full session.update payload is logged, including instructions, call-specific context, model configuration, and tool schema. This unnecessarily exposes internal workflow data and potentially sensitive user/business information beyond what is required to operate the call bridge, increasing leakage risk through log collection and support tooling.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The README explicitly promotes full transcript logging and call recording in debug mode but does not prominently warn about legal consent, privacy, and sensitive-data handling risks. In a phone-calling skill, recordings and transcripts can capture third-party personal data, payment details, health information, or other regulated content, so lack of an explicit warning increases the chance of unsafe deployment and noncompliant use.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README documents real outbound and optional inbound calling, including open inbound policy and cost estimates, but does not give a clear up-front warning that the skill can contact real third parties, trigger external actions, and incur telephony/API charges. Because this skill operates in the real world through Twilio and OpenAI, unclear user-facing risk disclosure makes accidental misuse, unwanted contact, and surprise billing more likely.

Vague Triggers

Medium

Confidence: 79% confidence
Finding: The invocation guidance is broad enough that ordinary natural-language requests could trigger real phone calls without a strong confirmation boundary. Because this skill initiates external communications and can incur cost or affect third parties, ambiguous invocation increases the chance of accidental or socially engineered activation. The context makes this more dangerous than a typical read-only skill because the action is irreversible and external.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill does not present a strong upfront warning that it places real calls to third parties and may create direct financial charges. Users could treat it like a simulation or low-risk assistant action, when in fact it can contact external people/businesses and trigger billable API/telephony usage. In this context, insufficient warning materially increases the risk of unintended outreach, privacy issues, and surprise costs.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: This tool can place real outbound phone calls to arbitrary external numbers, yet the tool interface does not present a strong, prominent warning or consent boundary at the point of use. In an agent ecosystem, that creates meaningful risk of accidental harassment, spam, costly calls, or social-engineering misuse because an upstream agent or user may trigger real-world actions without appreciating the consequences.

Missing User Warnings

Medium

Confidence: 79% confidence
Finding: This skill handles live telephone audio, may enable call recording in debug mode, and transmits voice data through third-party services, yet the manifest/tool description does not prominently warn users about privacy-sensitive audio handling or recording risk. In a voice-calling context, missing disclosure can lead to unauthorized collection, transmission, or retention of sensitive personal or business information during calls.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The code persists call metadata to a local JSONL file in the user's home directory, including phone numbers, task descriptions, call outcomes, and error details. In a voice-calling skill, this data is sensitive and may contain personal or regulated information; writing it to disk without any retention controls, minimization, encryption, or explicit disclosure increases privacy and compliance risk if the host is shared, backed up, or later compromised.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: This debug utility persists full call audio, transcripts, and event metadata to disk under the user's home directory. In the context of a real-time phone-calling skill, those artifacts can contain highly sensitive personal, financial, authentication, or regulated communications data, and the code shown provides no consent gate, retention policy, redaction, or automatic cleanup; file mode restrictions help but do not eliminate local privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The inbound-call context explicitly instructs the agent to 'help the caller with whatever they need,' which creates an overly broad authority boundary for a real-time voice agent interacting with untrusted callers. In this skill, that is more dangerous than in a normal chat setting because callers can socially engineer the agent into taking actions, disclosing information, or performing tasks outside the operator’s intended scope, and there is no user-facing notice or enforced capability restriction in this code path.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal