Voice Listener

Security checks across malware telemetry and agentic risk

Overview

This is a real voice dictation skill, but it uploads microphone audio to Baidu while running and can paste recognized speech into whichever app is focused.

Install only if you are comfortable with spoken audio being sent to Baidu while the listener is running, including before the wake word is detected. Use it only with the intended input field focused, avoid terminals/admin consoles/password-adjacent fields, protect the Baidu API key file, and stop the process with Ctrl+C when finished.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (12)

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill uses global clipboard writes and simulated Ctrl+V keystrokes to inject recognized text into whatever application currently has focus. This creates a real integrity and safety risk because microphone-captured or spoofed speech can cause unintended commands, messages, or data entry in privileged or sensitive contexts.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The skill continuously captures microphone audio and sends it to Baidu cloud endpoints for token retrieval and speech recognition. This is a genuine privacy and data-exposure issue because spoken content may include sensitive information, and the code performs ongoing collection/transmission rather than local-only processing.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The README advertises continuous listening and automatic insertion at the cursor, but does not clearly warn users about privacy exposure, accidental capture of nearby speech, or unintended text entry into sensitive fields. In a voice-input tool, this omission can lead to confidential data being recorded and inserted into chats, documents, terminals, or password-adjacent contexts without the user appreciating the risk.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README advertises continuous microphone listening and use of Baidu speech recognition, but it does not clearly warn users that captured speech may be sent to a third-party cloud service for processing. In a voice-listening skill, omission of an explicit privacy notice can cause users to unknowingly expose sensitive spoken data, especially because the feature is always-on after activation.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The documentation says that after activation, all speech is automatically input, but it does not clearly emphasize that recognized text will be injected into whatever application or field currently has focus. In context, this can lead to accidental command execution, disclosure into chats/forms, or unintended actions if the active window is sensitive.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The invocation phrase "启动语音监听" is broad and action-oriented, making accidental triggering plausible during normal conversation or scripted prompts. Because this skill enables microphone capture and downstream text injection, unintended activation has meaningful privacy and integrity consequences.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The skill prominently advertises continuous listening but does not clearly warn that speech is captured continuously and sent to Baidu cloud APIs. This creates a significant privacy risk because users may expose sensitive spoken content to a third party without informed consent.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: Automatically inserting recognized speech at the current cursor position can cause unintended commands, messages, or edits in whatever application is focused. Without a prominent warning, users may activate this skill while a sensitive terminal, browser, chat, or admin console is focused, leading to integrity and possible confidentiality issues.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README explicitly states that after the wake word is detected, all recognized speech is automatically entered into the active text field. Without prominent warnings about focus-dependent text injection, unintended speech, background conversations, or misrecognition could be inserted into chats, forms, search bars, or other applications, causing privacy leakage or accidental actions.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The guide documents a continuously listening voice skill and activation workflow, but it does not clearly warn users about the privacy and consent risks of ongoing microphone capture. In a voice-listening context, omission of these warnings can lead to users enabling persistent audio monitoring without understanding that nearby speech may be collected, processed, or transmitted to third-party speech APIs.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The wake word "小龙虾" and stop word "停止" are ordinary spoken phrases and are not scoped by speaker verification, push-to-talk, or other confirmation controls. In a continuously listening skill that also advertises automatic insertion at the cursor position, accidental activation could cause unintended recording, command capture, or text injection during normal conversation or audio playback.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The code transmits recorded speech to a third-party API without an explicit privacy warning or informed consent flow about external processing. Even if this is intended functionality, the absence of disclosure materially increases privacy risk and can mislead users about where their audio data goes.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal