WebChat Voice GUI

Security checks across malware telemetry and agentic risk

Overview

This is a local voice-input skill, but it needs review because it persistently patches WebChat and can automatically submit transcribed speech.

Install only if you are comfortable with a user-level hook that automatically patches the OpenClaw WebChat UI on startup. Treat continuous/live transcription as capable of sending spoken text into chat automatically, and avoid enabling it around sensitive conversations unless you have reviewed and accepted that behavior.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (4)

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: In continuous mode, the code automatically calls sendMessage(continuousText) after a short review timeout, which goes beyond merely inserting transcribed text into the input field. This can cause unintended prompts to be submitted without a deliberate final user action, increasing the risk of accidental command execution, disclosure of sensitive spoken content, or prompt injection into downstream agents.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The startup hook unconditionally executes an external shell script during gateway initialization, which expands the skill's privilege from UI enhancement into host-level code execution. Although the script path is fixed and arguments are not shell-interpolated, this still creates a risky execution surface: any compromise, tampering, or packaging abuse involving inject.sh results in arbitrary code running automatically at startup.

Missing User Warnings

Medium

Confidence: 85% confidence
Finding: The script transmits recorded audio to /transcribe or a localhost transcription service, but the UI text shown here does not clearly disclose that microphone audio leaves the page context for network processing. Users may assume speech-to-text is purely local in-browser, creating a privacy and consent gap, especially when using the localhost fallback or a proxied server endpoint.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: Continuous transcription mode can automatically submit recognized text after the review delay without an explicit warning when the mode is activated. In a chat/agent environment, that behavior is risky because spoken background audio, mis-transcriptions, or unintended phrases may be sent as live prompts to the system without a final confirmation step.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal