local-voice-reply

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local voice-reply skill, but it stores voice/audio data locally and exposes an unauthenticated local API that users should keep private.

Install only if you are comfortable running a local unauthenticated TTS server, installing the listed third-party packages, and storing uploaded voice samples and generated audio on disk. Keep the server bound to 127.0.0.1, upload only voices you have permission to use, avoid setting the output directory to broad sensitive locations, periodically delete old voice/cache/output files, and verify Feishu or Discord destinations before sending generated audio.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (6)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 90% confidence
Finding: The skill advertises and depends on capabilities including environment access, file reads/writes, and shell execution, but does not declare permissions. That creates a transparency and policy-enforcement gap: an agent or reviewer may activate a skill with more power than expected, increasing the chance of unsafe file operations, command execution, or data exposure. In this context the risk is elevated because the skill explicitly runs a local server, invokes Python/uvicorn/ffmpeg workflows, persists uploaded files, and writes output to user-accessible paths.

Context-Inappropriate Capability

Low

Confidence: 90% confidence
Finding: The /health endpoint exposes internal state including registered voice names, embedding cache keys, phrase cache keys, output directory, and benchmark details. In a voice-reply service, this information can aid reconnaissance, leak tenant or user-related identifiers, and reveal operational internals that make later attacks easier.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The /output/cleanup endpoint exposes file-deletion functionality driven by a client-supplied path, which is dangerous unless engine.cleanup_output strictly constrains deletion to a safe directory. In this skill context, a network-accessible voice API does not inherently need arbitrary caller-triggered deletion, so this expands attack surface and could enable destructive filesystem operations if path validation is weak.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The activation condition includes 'any user request for voice/audio reply,' which is broad enough to match many normal conversations and can cause the skill to run unexpectedly. Because this skill can start or rely on a local server, write media files, use registered voice assets, and trigger shell-backed workflows, over-activation increases the chance of unintended file generation, misuse of cloned voices, or execution of more privileged operations than the user explicitly requested.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The health response discloses registered voices and cache keys without any visible consent, warning, or access control. Even if intended for debugging, these fields may expose user-associated voice artifacts or internal identifiers, which is a privacy and information disclosure issue in a service that processes voice data.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: A destructive cleanup action is exposed via API without any visible warning, confirmation, or access restriction, and it operates on a caller-supplied path. In context, this makes unintended or malicious deletion more dangerous because the skill is a voice generation service, not an administrative file-management interface.

VirusTotal

60/60 vendors flagged this skill as clean.

View on VirusTotal