Fish Audio Speech

Fish Audio speech provider for OpenClaw with high-quality TTS, voice cloning, configurable voices, and voice-note friendly output for Telegram and WhatsApp.

Audits

Pass

ClawScanPass

Agentic behavior and permission review.

Static analysisPass

Pattern checks against bundled files.

VirusTotalPass

Multi-engine malware detections and file reputation.

Install

openclaw plugins install clawhub:@conan-scott/openclaw-fish-audio

Fish Audio Speech — OpenClaw Plugin

Fish Audio TTS plugin for OpenClaw, with high-quality voice cloning, Telegram/WhatsApp voice replies, and access to 1M+ voices via Fish Audio's voice library. Supports S2-Pro and S1 models.

Features

Voice cloning — use any Fish Audio voice (your own clones or community voices)
S2-Pro & S1 models — latest Fish Audio TTS models
Format-aware output — opus for voice notes (Telegram, WhatsApp), mp3 otherwise
Inline directives — control voice, speed, model, latency, and sampling per-message
Bundled agent skill — teaches agents to write Fish-friendly voice text and expressive markers
Voice listing — browse your cloned voices and popular community voices via /voice list

Installation

openclaw plugins install @conan-scott/openclaw-fish-audio

Then restart OpenClaw.

Getting an API Key

Sign up at fish.audio
Go to Account → API Keys → Create API Key
Create a revocable key with the minimum access you need
Copy the key for configuration below

Configuration

Prefer setting the API key as an environment variable or secret:

FISH_AUDIO_API_KEY=your-fish-audio-api-key

Then add the provider configuration to your openclaw.json:

{
  messages: {
    tts: {
      provider: "fish-audio",
      providers: {
        "fish-audio": {
          voiceId: "reference-id-of-your-voice",
          model: "s2-pro",       // s2-pro (default) | s1
          latency: "normal",     // normal (default) | balanced | low
          // speed: 1.0,         // 0.5–2.0 (optional)
          // temperature: 0.7,   // 0–1 (optional)
          // topP: 0.8,          // 0–1 (optional)
        },
      },
    },
  },
}

You can also set apiKey directly under messages.tts.providers.fish-audio, but secret-backed configuration is safer for shared systems and published examples.

Only set baseUrl for a Fish Audio-compatible endpoint you trust. The plugin sends the Fish Audio API key to that endpoint; custom URLs must use HTTPS except for localhost development.

Finding a Voice

Use the /voice list command in OpenClaw to browse available voices. The plugin shows:

Your cloned/trained voices (all pages, via self=true)
Popular community voices (top-ranked by score) as a fallback for new users

You can also browse voices at fish.audio and copy the voice ID from the URL.

Use cloned, trained, or community voices only when you have the rights, consent, and authorization to use that voice.

Inline Directives

All directive keys are provider-prefixed to avoid collisions with other speech providers. Both fishaudio_* and shorter fish_* aliases work.

[[tts:fishaudio_voice=<ref_id>]]         Switch voice
[[tts:fishaudio_speed=1.2]]              Prosody speed (0.5–2.0)
[[tts:fishaudio_model=s1]]               Model override
[[tts:fishaudio_latency=low]]            Latency mode
[[tts:fishaudio_temperature=0.7]]        Sampling temperature (0–1)
[[tts:fishaudio_top_p=0.8]]              Top-p sampling (0–1)

Short aliases: fish_voice, fish_speed, fish_model, fish_latency, fish_temperature, fish_top_p.

Expressive Markers

Fish Audio understands natural expressive markers in the text itself, such as (laughs) or (sighs). OpenClaw does not parse or transform these markers; the plugin passes text verbatim to Fish Audio's /v1/tts API. Round-bracket markers are confirmed working. Square-bracket marker syntax is unverified.

For agent-authored voice messages, avoid Markdown stage directions such as *laughs*; some TTS paths may read the asterisks literally. This package includes a fish-audio-tts AgentSkill so OpenClaw agents can learn the preferred plain-text style automatically.

Models

Model	Description
`s2-pro`	Latest high-quality model (default)
`s1`	Previous generation, lighter weight

Latency Modes

Mode	Description
`normal`	Best quality, higher latency (default)
`balanced`	Balance between quality and speed
`low`	Fastest response, may reduce quality

Troubleshooting

No voice configured: Set voiceId in config. Fish Audio has no universal default voice.
Empty voice list: New users with no cloned voices will see popular community voices as a starting point.
API key missing: Set either apiKey in config or FISH_AUDIO_API_KEY env var.

License

MIT