Kiwi Voice

Security checks across malware telemetry and agentic risk

Overview

The skill is a real Kiwi Voice service, but it grants and encourages broad voice-driven agent control beyond its narrow management description.

Install only if you intend to run a full always-listening OpenClaw voice assistant, not just a helper for editing Kiwi config. Before use, change any shipped API tokens, bind the REST API to localhost or protect it behind strong auth, review Home Assistant and Telegram tokens, decide whether browser microphone and speaker-profile storage are acceptable, and remove or narrow the 'never refuse/can do anything' prompts before exposing it to other speakers.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (194)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: # Try to load the module try: subprocess.run( ['pactl', 'load-module', 'module-echo-cancel'], capture_output=True, timeout=5
Confidence: 90% confidence
Finding: subprocess.run( ['pactl', 'load-module', 'module-echo-cancel'], capture_output=True, timeout=5 )

Lp3

Medium

Category: MCP Least Privilege
Confidence: 89% confidence
Finding: The skill documentation exposes and instructs use of capabilities including environment secret handling, file read/write, network connectivity, and shell execution, but it declares no permissions. This creates a trust and review gap: operators may authorize or run the skill without understanding its effective access, increasing the chance of secret exposure, unintended system modification, or network actions.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The documented purpose frames the skill as a management/configuration helper, but the referenced behavior is substantially broader: it runs a full voice assistant stack with audio capture, speech processing, Home Assistant control, web/UI interfaces, process control, and Telegram-mediated approvals. This mismatch is dangerous because users and reviewers may grant trust under a narrower mental model while the skill can access microphones, issue device-control actions, expose services, and interact with external systems.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The documented functionality goes well beyond managing and configuring the Kiwi Voice service. It exposes a general-purpose control surface including runtime control, browser microphone input, event streaming, soul switching, speaker management, restart/shutdown, and smart-home integration, which creates a capability mismatch that can mislead operators and approval systems about the skill's true power.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: Home Assistant control is unrelated to the stated purpose of managing the Kiwi Voice service and materially expands the blast radius from service administration to physical-environment control. If this skill is granted trust as a voice-service manager, it could be used to operate lights or other home devices outside the expected authorization boundary.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: NSFW personality routing is outside the stated service-management purpose and indicates hidden or unrelated agent-routing behavior. Even if not directly exploitable, undeclared routing to a separate agent/model increases trust and policy risk because the skill can change model/session behavior in ways users and reviewers would not expect.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The skill is scoped to Kiwi Voice service management, but the prompt grants unrestricted general-purpose powers across files, code, browser, terminal, search, and system management. This breaks least-privilege boundaries and can cause the assistant to act outside its intended domain, increasing the chance of harmful or unauthorized actions if a user issues unrelated requests.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The prompt asserts broad capabilities for browser, code, terminal, and system management without justification tied to the skill's purpose. Such expansive capability framing encourages misuse, social-engineers downstream systems into over-trusting the skill, and increases the blast radius of prompt injection or unsafe user requests.

Context-Inappropriate Capability

Medium

Confidence: 99% confidence
Finding: Telling the assistant to 'never refuse to execute' removes an essential safety control, especially in a voice-facing system that may receive ambiguous, accidental, or malicious requests. In this context, the instruction pressures the agent to comply even when actions are dangerous, unauthorized, or unrelated to Kiwi Voice management.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The service forwards arbitrary user-supplied text to the external Kiwi endpoint `homeassistant/command`, effectively creating a generic command bridge into Home Assistant conversation/control flows. Combined with the unverified inbound webhook model and lack of visible authorization checks in this file, this broadens the integration from device management into remote action execution, which could be abused to trigger unintended automations or control actions.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The documented `POST /api/homeassistant/command` endpoint extends the skill from managing Kiwi Voice into issuing arbitrary Home Assistant commands, which can affect external devices and automations. In the context of an agent skill described only as managing/configuring Kiwi, this scope expansion is dangerous because an agent may invoke home-control actions users did not expect from this skill.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The architecture explicitly includes an LLM gateway and an execution approval flow for shell commands, which materially exceeds the stated scope of managing and configuring a voice assistant service. Even if approval is required, exposing command-execution orchestration through a voice/LLM pipeline increases the attack surface for prompt injection, voice spoofing, unsafe approval handling, and privilege misuse.

Description-Behavior Mismatch

Low

Confidence: 80% confidence
Finding: The documentation describes REST/WebSocket APIs and forwarding internal events to external integrations, which creates additional ingress/egress channels beyond simple local service management. Undisclosed integration surfaces are dangerous because they can expose sensitive speech, speaker identity, state, or control events to other processes or remote clients, expanding opportunities for unauthorized access or abuse.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: A shell command approval workflow is a high-risk capability that is not justified by the stated function of editing voice configuration and managing the Kiwi service. In a voice-driven system, this is especially dangerous because recognized speech, LLM output, or compromised integrations may trigger command-generation paths, and an approval mechanism can fail open, be socially engineered, or be abused by an attacker with partial access.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The route-scope policy defaults any unlisted /api route to read scope, but the write endpoints POST /api/speakers/{speaker_id}/block and /unblock are not listed in ROUTE_SCOPES. As a result, tokens with only read scope can perform state-changing speaker management actions, which is an authorization bypass.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The code dynamically loads a model from torch.hub using a remote repository name at runtime. torch.hub can execute repository-provided Python during load, so a compromised upstream repo, dependency confusion, or MITM/supply-chain event could lead to arbitrary code execution on the host running the listener.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The locale's system prompt explicitly expands Kiwi from a voice-assistant management skill into an unrestricted agent that can perform files, programming, browser, terminal, search, and system administration tasks, and further says to execute any task immediately. In a voice-driven assistant, this materially broadens the attack surface and can cause the model to act outside the skill's intended security boundary.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The prompt grants unjustified authority for terminal, browser, file, coding, and system actions even though the skill metadata limits the skill to managing the Kiwi Voice service. This mismatch encourages unsafe tool use and makes prompt-induced privilege expansion more likely, especially for spoken commands that may be ambiguous or misrecognized.

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: User-facing response strings and activity text reference smart-home command handling and generic code/project operations, advertising functionality well beyond Kiwi Voice management. This can normalize dangerous expectations, trigger broader routing paths, and increase the chance that users or downstream logic invoke capabilities the skill should not expose.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The locale-defined system prompt expands a voice-service management skill into a fully capable general-purpose agent with access to files, code, browser, terminal, search, and system administration. In a voice-driven context, this materially increases the attack surface because speech recognition errors, unauthorized speakers, or prompt injection through spoken content could trigger powerful actions beyond the skill's stated scope.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The prompt tells Kiwi to never refuse and to do everything through OpenClaw, directly undermining the file's own approval and security messaging. This creates a policy conflict where the highest-priority behavioral instruction pushes execution even when safety controls should block, increasing the likelihood of unsafe or unauthorized operations.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The locale’s system prompt explicitly expands the skill from Kiwi Voice service management into an unrestricted general-purpose agent that can perform file, browser, terminal, and system-management actions. In a voice-triggered context, this materially broadens the skill’s authority beyond its declared scope and increases the chance of unsafe or unauthorized actions being carried out from natural-language input.

Description-Behavior Mismatch

Medium

Confidence: 86% confidence
Finding: These user-facing responses indicate smart-home command execution and routing, which is not disclosed by the skill metadata. Hidden or undocumented capability expansion is risky because users and policy controls may treat the skill as a voice-service manager while it can actually affect physical devices or home automation state.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The file defines broad voice-triggered Home Assistant routing using generic phrases like 'turn on', device nouns, and room names. That extends the skill from service management into ambient control of connected devices, creating a larger attack surface and increasing the likelihood of unintended activation from ordinary speech.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The prompt instructs the assistant to 'never refuse to execute' and says it 'can do anything,' directly encouraging unrestricted compliance. In a voice assistant, this weakens safety boundaries and can override refusal behavior for sensitive, destructive, or privacy-invasive requests.

VirusTotal

56/56 vendors flagged this skill as clean.

View on VirusTotal