Voice Assistant

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real voice-assistant skill, but it streams live microphone audio and assistant content through external services with under-disclosed permissions and user-facing privacy controls.

Review before installing. Use only with trusted provider keys and a trusted OpenClaw gateway, avoid speaking sensitive information unless the external STT/TTS providers are approved for that data, run the server only on a trusted network, and fix the metadata, missing .env.example, privacy notice/consent flow, origin/auth controls, and transcript rendering before broad deployment.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (3)

Lp3

Medium
Category
MCP Least Privilege
Confidence
95% confidence
Finding
The skill documentation indicates use of environment variables for API keys and network access to third-party STT/TTS services and the OpenClaw gateway, but the manifest declares no corresponding permissions in `metadata.openclaw.requires.env`. This creates a transparency and consent problem: users and policy engines cannot accurately assess what sensitive resources the skill needs, increasing the chance of unintended secret exposure or network data transfer.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
This skill captures live microphone audio, sends it to external STT providers, forwards resulting transcripts to the OpenClaw gateway, and then sends generated text to external TTS providers, but the description does not prominently warn users about that data flow. In a voice-assistant context this is especially sensitive because spoken content may contain personal, confidential, or regulated information, so lack of explicit disclosure undermines informed consent and safe deployment.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The page captures live microphone audio and immediately streams raw PCM over a WebSocket to the backend, but the UI only says "Click the mic to start" / "Listening..." and does not clearly disclose that speech is being transmitted off-device for STT/agent processing. In a voice-assistant skill, users may reasonably expect local capture for interaction, but not necessarily continuous remote streaming to third-party services; this creates a meaningful privacy and consent risk, especially because the app also fetches provider configuration indicating external STT/TTS backends.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal