Kesha Voice Kit

Security checks across malware telemetry and agentic risk

Overview

This skill is a coherent local voice transcription and speech tool, with disclosed privacy implications when transcripts are echoed into OpenClaw chat.

Install this only if you want a local voice toolkit and are comfortable with a global Bun package plus downloaded model assets. If voice notes may contain sensitive speech, consider changing OpenClaw's transcript echo setting before use, because the documented setup copies recognized speech into chat history/context.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (2)

Vague Triggers

Medium

Confidence: 86% confidence
Finding: The trigger keyword list includes broad, common terms such as "say," "privacy," and generic audio/file references, which can cause the skill to be invoked outside its intended context. In an agent setting, unintended invocation can route unrelated user content or files into transcription/TTS workflows, increasing the chance of privacy exposure, incorrect actions, or unnecessary command execution.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The documented OpenClaw configuration enables `echoTranscript: true`, which reflects transcribed audio content back into chat by default. Because transcripts may contain sensitive user speech, secrets, or third-party audio content, this creates a direct confidentiality risk and can also surface prompt-injection content extracted from audio into the conversational context.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal