Security audit

fish-speech-tts

Security checks across malware telemetry and agentic risk

Overview

This is a clear Fish Speech text-to-speech and voice-cloning helper, but it handles sensitive voice audio and writes local profile/output files.

Install only if you intend to use voice cloning/TTS and have permission to use the reference voices. Keep the API on localhost when possible, avoid untrusted --api-base endpoints, review where outputs and voice_profiles are stored, and delete retained reference audio when no longer needed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (4)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill documentation clearly indicates capabilities to read files, write outputs and voice-profile data, and contact a local HTTP API, yet the skill metadata declares no permissions. This creates a transparency and consent problem: users and policy enforcement layers cannot accurately assess or gate the skill's access before use.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: This skill is centered on voice cloning and storing reusable voice profiles, but the documentation does not warn about obtaining consent from the speaker, handling biometric-like voice data, or the privacy implications of registration and retention. In this context, omission is security-relevant because cloned voices and stored reference audio can enable impersonation, unauthorized reuse, or long-term retention of sensitive personal data.

Missing User Warnings

Low

Confidence: 83% confidence
Finding: The examples describe batch generation, output writing, and storage of reference audio/profile data without warning that these operations modify the local filesystem and may overwrite existing files or retain user-provided content. While not an exploit by itself, this is a real safety issue because it can lead to unintended data loss, silent persistence of sensitive audio, and poor operator awareness.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The function uploads a local reference audio file and associated text to the configured Fish Speech API without any explicit consent prompt, privacy warning, or trust boundary check. Because voice samples are biometric and potentially sensitive, silent transmission can expose personal data to an external or reconfigured endpoint, especially since `api_base` is configurable and not restricted to localhost.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal