Back to skill

Security audit

OpenClaw Voice Bridge

Security checks across malware telemetry and agentic risk

Overview

The skill mostly matches its voice-control purpose, but it needs review because it can send caller-supplied local file paths to OpenAI and turns speech into agent actions.

Install only if you trust the Snarling/OpenClaw environment and understand that recorded audio is sent to OpenAI. Before broad use, restrict accepted audio paths to the expected recording directory, avoid logging raw transcripts, require a current patched OpenClaw runtime, and consider confirmation or tighter tool limits for voice-originated agent actions.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Rogue AgentSelf-Modification, Session Persistence
Findings (17)

Lp3

Medium
Category
MCP Least Privilege
Confidence
89% confidence
Finding
The skill declares an environment variable (`OPENAI_API_KEY`) and clearly performs networked transcription, but there is no explicit permissions declaration beyond envVars in the skill metadata. That mismatch weakens operator visibility into the skill's real capabilities and can lead to over-trusting a plugin that handles sensitive audio and credentials.

Description-Behavior Mismatch

Medium
Confidence
89% confidence
Finding
The plugin does more than transcribe and display a reply: it injects the transcript into a subagent tied to the main session key, allowing spoken input to influence broader agent behavior. In this context, voice input is untrusted external input, so routing it into an agent workflow expands the attack surface beyond a simple notification bridge.

Description-Behavior Mismatch

Medium
Confidence
82% confidence
Finding
The fallback path contradicts the stated behavior by injecting transcribed voice content into the main agent session and triggering a heartbeat, rather than only using send_notification. That can broaden where untrusted voice content is processed and may cause unintended agent actions or data handling outside the narrow display-notification flow.

Missing User Warnings

Medium
Confidence
82% confidence
Finding
The code sends recorded voice audio to api.openai.com for transcription, but there is no visible consent, warning, or disclosure mechanism in this file. For a voice bridge handling potentially sensitive speech, silent transfer of audio to a third party creates a meaningful privacy risk even if it is functionally intended.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The code logs full speech transcripts to console and optional debug logs, which can capture sensitive spoken content such as personal data, secrets, or private conversations. Logs often have broader retention and access than the live feature path, making disclosure more likely.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The manifest states that audio is transcribed through OpenAI and requires an API key, but it does not provide any user-facing disclosure that recorded voice data leaves the local environment. In a voice-capture context, this increases privacy and consent risk because users may not understand that potentially sensitive spoken content is transmitted to a third party for processing.

Ssd 1

High
Confidence
97% confidence
Finding
The transcript is untrusted natural-language input, yet it is embedded directly into a subagent prompt that can use tools, including send_notification, and may inherit broader session capabilities. An attacker can speak prompt-injection content such as instructions to ignore prior directions, exfiltrate context, or invoke unintended tools, making spoken input a direct semantic control channel into the agent.

Ssd 1

High
Confidence
97% confidence
Finding
Untrusted transcribed speech is interpolated directly into a natural-language prompt for a subagent with tool-use instructions. A spoken prompt injection such as instructions to ignore prior directions, reveal context, or misuse tools can be semantically interpreted by the model, especially because the transcript is presented as part of the instruction-bearing prompt rather than isolated structured data.

Ssd 3

Medium
Confidence
89% confidence
Finding
The subagent is directed to generate a response from all spoken input and send it to a display notification channel, which can surface sensitive dictated information to anyone who can view the device. In a voice-interface context, users may speak secrets or personal data without realizing it will be echoed or summarized on-screen.

Credential Access

High
Category
Privilege Escalation
Content
const envKey = process.env.OPENAI_API_KEY;
  if (envKey) {
    cachedApiKey = envKey;
    console.info("[openclaw-voice-bridge] Resolved OpenAI key from process.env");
    debugLog("Resolved key from process.env");
    return envKey;
  }
Confidence
60% confidence
Finding
.env"

Credential Access

High
Category
Privilege Escalation
Content
if (envKey) {
    cachedApiKey = envKey;
    console.info("[openclaw-voice-bridge] Resolved OpenAI key from process.env");
    debugLog("Resolved key from process.env");
    return envKey;
  }
  console.warn("[openclaw-voice-bridge] No OpenAI API key available");
Confidence
60% confidence
Finding
.env"

Credential Access

High
Category
Privilege Escalation
Content
}
  } catch (_e) {}

  // Fallback: process.env
  const envKey = process.env.OPENAI_API_KEY;
  if (envKey) {
    cachedApiKey = envKey;
Confidence
60% confidence
Finding
.env

Credential Access

High
Category
Privilege Escalation
Content
const envKey = process.env.OPENAI_API_KEY;
  if (envKey) {
    cachedApiKey = envKey;
    console.info("[openclaw-voice-bridge] Resolved OpenAI key from process.env");
    debugLog("Resolved key from process.env");
    return envKey;
  }
Confidence
60% confidence
Finding
.env"

Credential Access

High
Category
Privilege Escalation
Content
if (envKey) {
    cachedApiKey = envKey;
    console.info("[openclaw-voice-bridge] Resolved OpenAI key from process.env");
    debugLog("Resolved key from process.env");
    return envKey;
  }
Confidence
60% confidence
Finding
.env"

Session Persistence

Medium
Category
Rogue Agent
Content
## Recording Pipeline (v5)

The v5 pipeline uses `subagent.run()` to create an isolated agent turn, which then uses `send_notification` to deliver the answer to the Snarling display:

1. **X press** → Snarling starts `arecord` immediately in a background thread (~82ms latency)
2. **Recording** → Snarling records 20s of audio to a WAV file
Confidence
82% confidence
Finding
create an isolated agent turn, which then uses `send_notification` to deliver the answer to the Snarling display: 1. **X press** → Snarling starts `arecord` immediately in a background thread (~82ms

Session Persistence

Medium
Category
Rogue Agent
Content
## Recording Pipeline (v5)

The v5 pipeline uses `subagent.run()` to create an isolated agent turn, which then uses `send_notification` to deliver the answer to the Snarling display:

1. **X press** → Snarling starts `arecord` immediately in a background thread (~82ms latency)
2. **Recording** → Snarling records 20s of audio to a WAV file
Confidence
84% confidence
Finding
create an isolated agent turn, which then uses `send_notification` to deliver the answer to the Snarling display: 1. **X press** → Snarling starts `arecord` immediately in a background thread (~82ms

Known Vulnerable Dependency: openclaw==2026.3.24-beta.2 — 10 advisory(ies): CVE-2026-53846 (OpenClaw: Workspace .env npm_execpath could influence bundled runtime dependency); CVE-2026-41913 (OpenClaw: Concurrent async auth attempts can bypass the intended shared-secret r); CVE-2026-43526 (OpenClaw: QQBot reply media URL handling could trigger SSRF and re-upload fetche) +7 more

High
Category
Supply Chain
Confidence
97% confidence
Finding
openclaw==2026.3.24-beta.2

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal