Voice Assistant

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real voice-assistant skill, but it streams live microphone audio and assistant content through external services with under-disclosed permissions and user-facing privacy controls.

Review before installing. Use only with trusted provider keys and a trusted OpenClaw gateway, avoid speaking sensitive information unless the external STT/TTS providers are approved for that data, run the server only on a trusted network, and fix the metadata, missing .env.example, privacy notice/consent flow, origin/auth controls, and transcript rendering before broad deployment.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (3)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 95% confidence
Finding: The skill documentation indicates use of environment variables for API keys and network access to third-party STT/TTS services and the OpenClaw gateway, but the manifest declares no corresponding permissions in `metadata.openclaw.requires.env`. This creates a transparency and consent problem: users and policy engines cannot accurately assess what sensitive resources the skill needs, increasing the chance of unintended secret exposure or network data transfer.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: This skill captures live microphone audio, sends it to external STT providers, forwards resulting transcripts to the OpenClaw gateway, and then sends generated text to external TTS providers, but the description does not prominently warn users about that data flow. In a voice-assistant context this is especially sensitive because spoken content may contain personal, confidential, or regulated information, so lack of explicit disclosure undermines informed consent and safe deployment.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The page captures live microphone audio and immediately streams raw PCM over a WebSocket to the backend, but the UI only says "Click the mic to start" / "Listening..." and does not clearly disclose that speech is being transmitted off-device for STT/agent processing. In a voice-assistant skill, users may reasonably expect local capture for interaction, but not necessarily continuous remote streaming to third-party services; this creates a meaningful privacy and consent risk, especially because the app also fetches provider configuration indicating external STT/TTS backends.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal