Security audit

senseaudio-let-claw-talkv1

Security checks across malware telemetry and agentic risk

Overview

This skill appears to be a genuine voice-assistant skill, but it needs Review because it can start a long-running microphone process that sends speech to cloud services and keeps local state without a strong first-run consent or retention story.

Install only if you are comfortable with a long-running microphone assistant that can send speech and reply text to SenseAudio cloud APIs. Review the wake phrase, disable the skill when not in use, keep the WeSpeaker service bound to 127.0.0.1, protect the SenseAudio credential file, and periodically clear workspace logs, preferences, generated reply audio, and any voice profiles you no longer need.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (8)

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: This skill is built around continuous microphone listening and remote ASR/TTS processing, but the description lacks an explicit privacy warning that speech may be continuously captured and sent to SenseAudio services. In this context, omission is security-relevant because users may enable an always-on assistant without realizing the privacy implications for bystanders, ambient speech, and retained logs/state.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The default prompt maps a wide set of natural-language phrases directly to launching a long-running detached assistant process, and explicitly tells the agent to execute it rather than confirm intent. That creates a real risk of unintended invocation from ambiguous user wording or paraphrases, especially because the action opens a new terminal and starts continuous listening behavior on the local machine.

Natural-Language Policy Violations

Medium

Confidence: 84% confidence
Finding: The skill hard-codes Chinese wake and sleep phrases as defaults without first confirming the user's language preference or consent to voice-trigger behavior. In an always-listening assistant context, fixed unsolicited wake words can cause accidental activation, user confusion, and privacy-sensitive recording behavior if the environment or user did not intend those triggers.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The function uploads raw audio content to a third-party ASR endpoint, but the code shown provides no user-facing notice, consent flow, or data-handling disclosure. In a continuous-listening desktop assistant context, this increases privacy risk because users may not realize microphone captures are leaving the local device and being processed externally.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The /clear_profile endpoint deletes persisted voice-profile files based solely on an HTTP request and performs no authentication, authorization, CSRF-style origin checks, or confirmation. Because the service exposes state-changing endpoints over HTTP, any local process—and potentially remote hosts if bound to a non-loopback interface—can erase stored profiles, causing loss of enrolled biometric data and denial of service for speaker verification.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The /shutdown endpoint lets any caller terminate the background service without authentication or authorization checks. In this skill context the service is an always-on local voice component, so unauthorized shutdown directly disrupts availability and could be triggered by another local process or by remote access if the host is configured unsafely.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill writes captured microphone audio to WAV files and generated speech to MP3 files under the workspace state directory, but there is no clear user-facing disclosure or retention control around this persistence. For a continuously listening assistant, silent disk persistence materially increases privacy risk because sensitive speech may remain recoverable after the interaction ends.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The transcribe and TTS paths send user audio/text to external SenseAudio services using locally stored API credentials, but the code does not present a clear consent prompt or privacy disclosure before transmitting that data. In the context of a continuous voice assistant, this is especially sensitive because spoken content may include private or unintended information.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.