Security audit

voiceclaw

Security checks across malware telemetry and agentic risk

Overview

This is a coherent voice-assistant skill, but it needs review because it captures speech and automatically uses an OpenClaw gateway token without clear upfront disclosure.

Install only if you are comfortable with browser microphone capture, transcript/conversation processing by the local voiceclaw server, and use of your local OpenClaw gateway token. Prefer a scoped token, confirm the server binds to localhost, avoid speaking sensitive information until logging/retention is clear, and review the docs for where audio, transcripts, and credentials flow.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (8)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 88% confidence
Finding: The skill documentation describes capabilities that use environment variables, local credential files, and networked services, but it does not declare corresponding permissions or clearly communicate them to users. This creates a transparency and consent problem: users may invoke a voice skill that accesses tokens and transmits content to local services without understanding its effective privileges.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The document explicitly says there are no secrets while also stating that a gateway token is automatically discovered from `~/.openclaw/openclaw.json`. That is security-relevant credential handling, and mislabeling it as 'no secrets' can cause developers and operators to under-protect the token, expose it in logs, or grant access without informed consent.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README explains a microphone-driven voice workflow that sends user speech-derived content through backend services, but it does not clearly disclose privacy implications, retention expectations, or where audio/transcripts may be processed. In a voice-assistant skill, omission of this notice can cause users to unknowingly expose sensitive spoken data, especially because the architecture involves browser capture, backend processing, OpenClaw, and VOICEVOX interactions.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The description markets voice functionality but does not prominently warn that microphone input is captured and spoken content is transmitted to local services for STT/LLM/TTS processing. In a voice-assistant context this is especially sensitive because users may expose private conversations or ambient speech without informed consent.

Missing User Warnings

Low

Confidence: 90% confidence
Finding: The file states that a gateway token is auto-detected from ~/.openclaw/openclaw.json, but it does not warn users that the skill will read and use credentials from local configuration. Silent credential discovery increases the risk of users unknowingly authorizing API actions or exposing sensitive tokens through misconfiguration or logs.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The architecture describes automatic detection and use of a bearer token from a local config file with no warning, prompt, or trust-boundary discussion. In a voice-assistant skill that bridges browser input to a gateway, silent credential use increases the risk of unauthorized API access, surprise privilege use, and accidental exposure if the service is deployed in shared environments.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The page captures spoken input via browser speech recognition and then sends recognized transcripts and accumulated conversation history to backend endpoints such as /api/chat-stream and /api/tts, but the UI does not clearly disclose that speech-derived content is transmitted off-device. In a voice assistant context, users may assume audio or transcripts are processed locally, so the lack of explicit notice and consent can lead to unintended disclosure of sensitive spoken information.

Credential Access

High

Category: Privilege Escalation
Content: cd voiceclaw npm install # Start (no .env needed if OpenClaw is running locally) npm start # → [voiceclaw] OpenClaw config loaded from ~/.openclaw/openclaw.json # → [voiceclaw] listening on http://127.0.0.1:8788
Confidence: 95% confidence
Finding: .env

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.env_credential_access

Environment variable access combined with network send.

Critical

Code: suspicious.env_credential_access
Location: src/server.js:37