Voice Chat Skill

Security checks across malware telemetry and agentic risk

Overview

This voice-chat skill mostly does what it claims, but it needs review because spoken audio can be sent to cloud speech services while parts of the documentation imply local handling.

Install only if you are comfortable granting microphone access and potentially sending speech or transcribed text to Google, ElevenLabs, or other configured providers. Prefer local/offline modes for sensitive conversations, avoid copying the shell=True playback sample, and treat console transcript output as potentially sensitive.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Output HandlingUnvalidated Output Injection, Cross-Context Output, Unbounded Output
Tool MisuseTool Parameter Abuse, Chaining Abuse, Unsafe Defaults
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (11)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill demonstrates capabilities including network access, shell execution, file writes, and environment-variable use, but does not declare permissions or clearly constrain them. This reduces transparency and weakens review/runtime policy enforcement, making it easier for risky behavior to be introduced or overlooked.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 89% confidence
Finding: The documented behavior does not fully match the code examples: the default STT example uses Google's online recognition service, external APIs are included, and subprocess/audio-launch behavior goes beyond a simple local voice chat description. Behavior-description mismatch is security-relevant because users may grant trust or permissions based on incomplete expectations.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The security notes claim that voice data is processed locally, but the default implementation sends audio to Google's online speech recognition service. This is a direct privacy and trust issue because users may expose microphone data under a false assumption of local-only handling.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The documentation explicitly promotes cloud-based speech recognition and AI API integration but does not warn that user audio or transcribed text may be sent to third-party services. For a voice-chat skill, that omission is significant because microphone input is highly sensitive and users may reasonably assume processing is local unless told otherwise.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill sends captured microphone audio to Google's remote speech-recognition service without an explicit consent flow, privacy notice, or clear indication that spoken content leaves the local machine. Because the application continuously captures user speech in an interactive loop, sensitive spoken data could be transmitted off-device unexpectedly, creating a genuine privacy and data-exposure risk.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: In Google recognition mode, captured microphone audio is sent to Google's speech recognition service, but the program does not present a clear, prior privacy notice or obtain explicit user consent before transmission. Because this skill handles live voice input, it may inadvertently send sensitive spoken data to a third party, which is especially relevant in a voice-chat context.

Ssd 3

Medium

Confidence: 91% confidence
Finding: The conversation loop stores all user utterances and AI responses in memory and prints the full history in plain text at the end of the session. In a voice assistant, users may speak passwords, personal details, or other sensitive information, so echoing the transcript back to the console increases the risk of shoulder-surfing, terminal logging exposure, and accidental disclosure.

External Transmission

Medium

Category: Data Exfiltration
Content: } } response = requests.post(url, json=data, headers=headers) if response.status_code == 200: with open("output.mp3", "wb") as f: f.write(response.content)
Confidence: 90% confidence
Finding: requests.post(url, json=

External Transmission

Medium

Category: Data Exfiltration
Content: """使用ElevenLabs TTS""" import requests url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}" headers = { "xi-api-key": api_key or os.environ.get("ELEVENLABS_API_KEY"), "Content-Type": "application/json"
Confidence: 88% confidence
Finding: https://api.elevenlabs.io/

Unvalidated Output Injection

High

Category: Output Handling
Content: if result.returncode == 0: print(f"✅ 语音文件已生成: {output_file}") # 播放音频 subprocess.run(["start", output_file], shell=True) else: print(f"❌ TTS失败: {result.stderr}") finally:
Confidence: 98% confidence
Finding: subprocess.run(["start", output

Tool Parameter Abuse

High

Category: Tool Misuse
Content: if result.returncode == 0: print(f"✅ 语音文件已生成: {output_file}") # 播放音频 subprocess.run(["start", output_file], shell=True) else: print(f"❌ TTS失败: {result.stderr}") finally:
Confidence: 97% confidence
Finding: subprocess.run(["start", output_file], shell=True

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal