Voice Reply

Security checks across malware telemetry and agentic risk

Overview

This is a real TTS voice-reply skill, but it needs review because it can automatically send replies and voice samples to Noiz, store files and an API key locally, install a dependency at runtime, and exposes broader voice-cloning/timeline features than the description makes clear.

Review before installing. Use this only if you are comfortable with assistant reply text, optional reference audio, and possibly subtitle text being sent to Noiz; generated audio being saved and played locally; previous generated audio being deleted; and a Noiz API key being stored in your home directory. Install dependencies yourself from trusted sources and avoid reference-audio URLs or save_voice unless you intentionally want remote voice-cloning behavior.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (11)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: def ensure_noiz_ready() -> None: if importlib.util.find_spec("requests") is None: print("[noiz] Installing requests...", file=sys.stderr) subprocess.check_call(["uv", "pip", "install", "requests"]) def detect_text_lang(text: str) -> str:
Confidence: 79% confidence
Finding: subprocess.check_call(["uv", "pip", "install", "requests"])

subprocess module call

Medium

Category: Dangerous Code Execution
Content: def ensure_noiz_ready() -> None: if importlib.util.find_spec("requests") is None: print("[noiz] Installing requests...", file=sys.stderr) subprocess.check_call(["uv", "pip", "install", "requests"]) def detect_text_lang(text: str) -> str:
Confidence: 86% confidence
Finding: subprocess.check_call(["uv", "pip", "install", "requests"])

Lp3

Medium

Category: MCP Least Privilege
Confidence: 88% confidence
Finding: The skill documentation describes capabilities that require network access, file read/write, shell execution, and environment/config handling, yet no permissions are declared. This creates a transparency and consent problem: users or the host agent may invoke a skill with broader operational reach than expected, increasing the chance of unintended file deletion, local command execution, or exfiltration of reply content to external services.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 94% confidence
Finding: The documented purpose is a simple auto-save voice reply skill, but the underlying behavior reportedly includes additional features such as guest API usage, local TTS backends, subtitle/timeline rendering, reference-audio download/cutting, voice cloning, arbitrary output paths, direct playback, and API key persistence. This mismatch materially weakens informed consent and reviewability because operators may authorize a benign-seeming TTS helper that actually has broader data handling and media-processing capabilities than disclosed.

Context-Inappropriate Capability

Medium

Confidence: 85% confidence
Finding: Installing Python packages at runtime is broader than the stated skill purpose and materially increases risk in an agent environment. It introduces remote code execution potential through dependency confusion, package compromise, or unexpected installer behavior, especially because it happens automatically when a module is missing.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The trigger condition 'when the user speaks' is overly broad and lacks clear activation boundaries, which can cause the skill to send or transform every response automatically. In context, this is risky because replies may contain sensitive data that would then be spoken aloud, saved locally, or transmitted to a third-party TTS service without a deliberate per-use decision.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The skill advertises automatic deletion of the previous desktop audio file but does not clearly warn about the destructive effect or scope. Even if limited to the skill's own output, silent deletion can cause loss of user data or confusion, especially when files are stored in a common location like the desktop.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documentation does not clearly disclose that reply content may be sent to Noiz AI, a third-party service, for speech synthesis. In this context, that omission is significant because assistant replies can contain personal, confidential, or regulated information, and automatic transmission to an external provider creates privacy, compliance, and retention risks.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The metadata explicitly says every reply is automatically converted to audio and saved to the desktop, but it does not mention any consent prompt, opt-in setting, or warning about local file creation. Automatic persistence of generated content on a user's desktop can expose sensitive conversation data, surprise users, and create privacy and data-handling risks on shared or monitored systems.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: When --reference-audio is provided, the script uploads a local audio file to a remote Noiz API for voice cloning without any explicit consent prompt or strong disclosure beyond the argument name. In a voice-reply skill context, this is more dangerous because users may expect local audio generation and may not realize biometric voice data is being transmitted to and possibly retained by a third party.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: When the noiz backend is used, the script sends subtitle text to the cloud API and may also upload reference audio, including dynamically sliced audio from a local track, but it provides no explicit user-facing warning or consent gate. In a voice-reply skill, this context increases sensitivity because subtitles and voice samples can contain personal data, confidential speech, or biometric identifiers, making silent exfiltration to a third party a meaningful privacy/security issue.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal