Xiaozhi Mcp Server

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real voice-assistant bridge, but it needs review because it exposes broad agent control over an insecure token-protected network service and ships with an unexplained prefilled target session.

Install only after editing config.yaml to remove or replace the bundled target_session, and treat the connection token like a password. Keep port 28765 on a trusted LAN or behind TLS/VPN/reverse proxy controls, avoid speaking sensitive secrets, rotate the token if exposed, and consider installing dependencies in a virtual environment instead of running the startup script as-is.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (12)

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The server exposes a generic run_agent interface that forwards arbitrary caller-supplied messages to the local OpenClaw agent, which is effectively a powerful command/delegation primitive. In the context of a voice-assistant integration, this materially expands the attack surface because any party with the bearer token can cause broad agent actions beyond narrowly scoped assistant tasks.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The script automatically installs Python packages with pip3 during startup, modifying the host environment without explicit consent. This introduces supply-chain risk, can break system-managed Python installations, and is broader host impact than expected for a simple startup helper for a voice-assistant bridge.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The script contacts ifconfig.me to discover the machine's public IP, causing an unsolicited outbound network request to a third party. This leaks metadata about the host and execution time, and is not strictly necessary to run the local server; in this skill context it is only for convenience when displaying connection instructions.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README explicitly instructs users to expose a server IP, open port 28765, and retrieve/share a connection token while describing that user speech will be forwarded to OpenClaw, but it provides no warning about privacy, token sensitivity, network exposure, or transport security. This can lead users to deploy the service insecurely, leaking voice data or allowing unauthorized access if the token or endpoint is exposed.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill states that everything spoken to the device is forwarded to OpenClaw for processing, but it does not clearly warn users that voice content may leave the local device and could include sensitive personal or business information. This creates a privacy and consent problem because users may unknowingly transmit confidential data to another service.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The instructions tell users to read a token from a local file and enter it into a third-party device configuration, but they do not warn that the token is a secret credential. If disclosed or mishandled, an attacker could reuse the token to connect to the service, impersonate the client, or access linked functionality.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The code transmits the bearer token and user message content over plain HTTP and an unsecured WebSocket URL (`http://` and `/ws?token=...`), allowing network attackers on the same LAN, Wi‑Fi, or upstream path to intercept or modify traffic. Putting the token in the WebSocket query string is especially risky because it may also be exposed in logs, proxies, and diagnostics, increasing the chance of credential theft and session hijacking.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README explicitly tells users to hardcode WiFi SSID, WiFi password, server IP, and an MCP token directly into Arduino source without any warning about credential sensitivity. In embedded workflows, source files are often committed, shared, or left on developer machines, so this guidance increases the chance of accidental credential leakage and unauthorized access to the network or backend service.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The README describes passing user speech text to a remote agent via mcp.callTool("run_agent", "用户说的话", true) but does not disclose that spoken content is transmitted off-device. Because voice input can contain personal, sensitive, or ambient information, omitting a privacy warning can lead to unintentional data exposure and misuse, especially in a smart-assistant context that users may expect to be local.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The script installs packages and creates or modifies configuration files automatically, with no explicit consent before making persistent changes to the system. Even if intended as a convenience, silent environment mutation increases operational risk and can surprise users running the script on shared or production hosts.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The script kills an old PID from /tmp and also runs pkill -f "server.py.*28765", which can terminate matching processes without user confirmation. This creates denial-of-service risk and may affect unrelated workloads if process names or arguments happen to match, especially on multi-user systems.

Missing User Warnings

Low

Confidence: 83% confidence
Finding: The script makes an external network request to a public IP discovery service without warning the user. While lower severity than code execution issues, it still creates unnecessary data exposure and unexpected network activity during startup.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal