Back to skill

Security audit

fish-speech-tts

Security checks across malware telemetry and agentic risk

Overview

This is a clear Fish Speech text-to-speech and voice-cloning helper, but it handles sensitive voice audio and writes local profile/output files.

Install only if you intend to use voice cloning/TTS and have permission to use the reference voices. Keep the API on localhost when possible, avoid untrusted --api-base endpoints, review where outputs and voice_profiles are stored, and delete retained reference audio when no longer needed.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (4)

Lp3

Medium
Category
MCP Least Privilege
Confidence
91% confidence
Finding
The skill documentation clearly indicates capabilities to read files, write outputs and voice-profile data, and contact a local HTTP API, yet the skill metadata declares no permissions. This creates a transparency and consent problem: users and policy enforcement layers cannot accurately assess or gate the skill's access before use.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
This skill is centered on voice cloning and storing reusable voice profiles, but the documentation does not warn about obtaining consent from the speaker, handling biometric-like voice data, or the privacy implications of registration and retention. In this context, omission is security-relevant because cloned voices and stored reference audio can enable impersonation, unauthorized reuse, or long-term retention of sensitive personal data.

Missing User Warnings

Low
Confidence
83% confidence
Finding
The examples describe batch generation, output writing, and storage of reference audio/profile data without warning that these operations modify the local filesystem and may overwrite existing files or retain user-provided content. While not an exploit by itself, this is a real safety issue because it can lead to unintended data loss, silent persistence of sensitive audio, and poor operator awareness.

Missing User Warnings

Medium
Confidence
87% confidence
Finding
The function uploads a local reference audio file and associated text to the configured Fish Speech API without any explicit consent prompt, privacy warning, or trust boundary check. Because voice samples are biometric and potentially sensitive, silent transmission can expose personal data to an external or reconfigured endpoint, especially since `api_base` is configurable and not restricted to localhost.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal