Fish Audio S2 Pro TTS

PassAudited by ClawScan on May 17, 2026.

Overview

This documentation-only TTS/voice-cloning skill is coherent, but users should verify external software sources, avoid exposing its server publicly, and manage stored voice data carefully.

This skill appears benign as documentation for running Fish Audio S2 Pro. Before installing, verify the external package/container/model sources, run setup in an isolated environment, keep the API bound to localhost unless you add access controls, and only upload voice samples with proper consent.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

Installing or running the referenced software means trusting external package, container, and model sources.

Why it was flagged

The setup pulls executable packages, container images, and model files from external registries; this is expected for this TTS model, but those external artifacts are outside the reviewed skill contents and are not pinned here.

Skill content

pip install fish-speech ... docker pull fishaudio/fish-speech ... hf download fishaudio/s2-pro --local-dir checkpoints/s2-pro

Recommendation

Use an isolated environment, prefer official Fish Audio sources, pin versions where possible, and review or trust the external package/container before running it.

Low

#ASI02: Tool Misuse and Exploitation

What this means

If exposed on a network, other users may be able to invoke the TTS service or consume local compute resources.

Why it was flagged

Binding to 0.0.0.0 can expose the TTS API beyond the local machine if the user runs this command; the docs do not show authentication or firewall controls.

Skill content

python tools/api_server.py --llama-checkpoint-path checkpoints/s2-pro --decoder-checkpoint-path checkpoints/s2-pro/codec.pth --listen 0.0.0.0:8080

Recommendation

Bind to localhost unless remote access is required, and add firewall rules, authentication, or a trusted reverse proxy before exposing the server.

Low

#ASI06: Memory and Context Poisoning

What this means

Uploaded voice profiles may remain on disk across sessions and could be reused for future synthesis if not deleted.

Why it was flagged

Voice samples or derived speaker profiles are sensitive personal data and are documented as being stored persistently for later reuse.

Skill content

audio_sample=@voice.wav ... After upload: "voice": "my_voice". Persisted to `~/.cache/vllm-omni/speakers/*.safetensors`.

Recommendation

Upload only voices you are authorized to use, keep the cache directory protected, and delete stored voice profiles when they are no longer needed.