Security audit

OpenClaw Voice Bridge

Security checks across malware telemetry and agentic risk

Overview

The skill mostly matches its voice-control purpose, but it needs review because it can send caller-supplied local file paths to OpenAI and turns speech into agent actions.

Install only if you trust the Snarling/OpenClaw environment and understand that recorded audio is sent to OpenAI. Before broad use, restrict accepted audio paths to the expected recording directory, avoid logging raw transcripts, require a current patched OpenClaw runtime, and consider confirmation or tighter tool limits for voice-originated agent actions.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Rogue AgentSelf-Modification, Session Persistence

Findings (17)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 89% confidence
Finding: The skill declares an environment variable (`OPENAI_API_KEY`) and clearly performs networked transcription, but there is no explicit permissions declaration beyond envVars in the skill metadata. That mismatch weakens operator visibility into the skill's real capabilities and can lead to over-trusting a plugin that handles sensitive audio and credentials.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The plugin does more than transcribe and display a reply: it injects the transcript into a subagent tied to the main session key, allowing spoken input to influence broader agent behavior. In this context, voice input is untrusted external input, so routing it into an agent workflow expands the attack surface beyond a simple notification bridge.

Description-Behavior Mismatch

Medium

Confidence: 82% confidence
Finding: The fallback path contradicts the stated behavior by injecting transcribed voice content into the main agent session and triggering a heartbeat, rather than only using send_notification. That can broaden where untrusted voice content is processed and may cause unintended agent actions or data handling outside the narrow display-notification flow.

Missing User Warnings

Medium

Confidence: 82% confidence
Finding: The code sends recorded voice audio to api.openai.com for transcription, but there is no visible consent, warning, or disclosure mechanism in this file. For a voice bridge handling potentially sensitive speech, silent transfer of audio to a third party creates a meaningful privacy risk even if it is functionally intended.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The code logs full speech transcripts to console and optional debug logs, which can capture sensitive spoken content such as personal data, secrets, or private conversations. Logs often have broader retention and access than the live feature path, making disclosure more likely.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The manifest states that audio is transcribed through OpenAI and requires an API key, but it does not provide any user-facing disclosure that recorded voice data leaves the local environment. In a voice-capture context, this increases privacy and consent risk because users may not understand that potentially sensitive spoken content is transmitted to a third party for processing.

Ssd 1

High

Confidence: 97% confidence
Finding: The transcript is untrusted natural-language input, yet it is embedded directly into a subagent prompt that can use tools, including send_notification, and may inherit broader session capabilities. An attacker can speak prompt-injection content such as instructions to ignore prior directions, exfiltrate context, or invoke unintended tools, making spoken input a direct semantic control channel into the agent.

Ssd 1

High

Confidence: 97% confidence
Finding: Untrusted transcribed speech is interpolated directly into a natural-language prompt for a subagent with tool-use instructions. A spoken prompt injection such as instructions to ignore prior directions, reveal context, or misuse tools can be semantically interpreted by the model, especially because the transcript is presented as part of the instruction-bearing prompt rather than isolated structured data.

Ssd 3

Medium

Confidence: 89% confidence
Finding: The subagent is directed to generate a response from all spoken input and send it to a display notification channel, which can surface sensitive dictated information to anyone who can view the device. In a voice-interface context, users may speak secrets or personal data without realizing it will be echoed or summarized on-screen.

Credential Access

High

Category: Privilege Escalation
Content: const envKey = process.env.OPENAI_API_KEY; if (envKey) { cachedApiKey = envKey; console.info("[openclaw-voice-bridge] Resolved OpenAI key from process.env"); debugLog("Resolved key from process.env"); return envKey; }
Confidence: 60% confidence
Finding: .env"

Credential Access

High

Category: Privilege Escalation
Content: if (envKey) { cachedApiKey = envKey; console.info("[openclaw-voice-bridge] Resolved OpenAI key from process.env"); debugLog("Resolved key from process.env"); return envKey; } console.warn("[openclaw-voice-bridge] No OpenAI API key available");
Confidence: 60% confidence
Finding: .env"

Credential Access

High

Category: Privilege Escalation
Content: } } catch (_e) {} // Fallback: process.env const envKey = process.env.OPENAI_API_KEY; if (envKey) { cachedApiKey = envKey;
Confidence: 60% confidence
Finding: .env

Credential Access

High

Category: Privilege Escalation
Content: const envKey = process.env.OPENAI_API_KEY; if (envKey) { cachedApiKey = envKey; console.info("[openclaw-voice-bridge] Resolved OpenAI key from process.env"); debugLog("Resolved key from process.env"); return envKey; }
Confidence: 60% confidence
Finding: .env"

Credential Access

High

Category: Privilege Escalation
Content: if (envKey) { cachedApiKey = envKey; console.info("[openclaw-voice-bridge] Resolved OpenAI key from process.env"); debugLog("Resolved key from process.env"); return envKey; }
Confidence: 60% confidence
Finding: .env"

Session Persistence

Medium

Category: Rogue Agent
Content: ## Recording Pipeline (v5) The v5 pipeline uses `subagent.run()` to create an isolated agent turn, which then uses `send_notification` to deliver the answer to the Snarling display: 1. **X press** → Snarling starts `arecord` immediately in a background thread (~82ms latency) 2. **Recording** → Snarling records 20s of audio to a WAV file
Confidence: 82% confidence
Finding: create an isolated agent turn, which then uses `send_notification` to deliver the answer to the Snarling display: 1. **X press** → Snarling starts `arecord` immediately in a background thread (~82ms

Session Persistence

Medium

Category: Rogue Agent
Content: ## Recording Pipeline (v5) The v5 pipeline uses `subagent.run()` to create an isolated agent turn, which then uses `send_notification` to deliver the answer to the Snarling display: 1. **X press** → Snarling starts `arecord` immediately in a background thread (~82ms latency) 2. **Recording** → Snarling records 20s of audio to a WAV file
Confidence: 84% confidence
Finding: create an isolated agent turn, which then uses `send_notification` to deliver the answer to the Snarling display: 1. **X press** → Snarling starts `arecord` immediately in a background thread (~82ms

Known Vulnerable Dependency: openclaw==2026.3.24-beta.2 — 10 advisory(ies): CVE-2026-53846 (OpenClaw: Workspace .env npm_execpath could influence bundled runtime dependency); CVE-2026-41913 (OpenClaw: Concurrent async auth attempts can bypass the intended shared-secret r); CVE-2026-43526 (OpenClaw: QQBot reply media URL handling could trigger SSRF and re-upload fetche) +7 more

High

Category: Supply Chain
Confidence: 97% confidence
Finding: openclaw==2026.3.24-beta.2

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal