GPT-SoVITS TTS

Other

High-quality Chinese TTS using GPT-SoVITS v2 Pro+ — convert text to natural-sounding speech with voice cloning support.

Install

openclaw skills install gptsovits-tts

GPT-SoVITS TTS

A production-ready text-to-speech skill that connects to a local GPT-SoVITS v2 Pro+ API server. Converts Chinese text to natural-sounding speech with a cloned reference voice. Designed for voice response automation, content narration, and AI voice applications.

Features

  • Clean TTS pipeline: Text → GPT-SoVITS API → WAV → MP3 (128kbps, 44100Hz, mono)
  • Voice cloning: Uses a pre-recorded reference audio for consistent voice output
  • Configurable: API URL, timeout, TTS parameters (speed, top_k, top_p, temperature, seed)
  • No GPU required: Pure CPU inference, works on any machine (approx. 5-10s per sentence)

Requirements

  • GPT-SoVITS v2 Pro+ API running at http://127.0.0.1:9880 (or set GPT_SOVITS_API_URL)
  • ffmpeg installed and in PATH (for WAV→MP3 conversion)
  • Node.js packages: axios

Model files needed (on the API server side)

ComponentFileSize
s1s1v3.ckpt148MB
s2s2Gv2ProPlus.pth191MB
BERTchinese-roberta-wwm-ext-large621MB
CNHuBERTchinese-hubert-base180MB
Speaker Verificationpretrained_eres2netv2w24s4ep4.ckpt103MB
Reference Audioref_audio.wav~10-30s clean recording

Quick Start

1. Start GPT-SoVITS API

cd /path/to/GPT-SoVITS-CPUFast
conda activate GPTSoVits
python api_v2.py -a 127.0.0.1 -p 9880

2. Set reference audio

Place a clean .wav file (10-30 seconds of the target voice) at:

voice-clone/ref_audio.wav

3. Use the skill

const tts = require('./skills/voice-clone');
const mp3 = await tts.speak("你好,欢迎使用GPT-SoVITS语音合成。", "output.mp3");
// Returns: "output.mp3"

API

speak(text, outputPath, opts?)

ParamTypeDefaultDescription
textstringrequiredChinese text to synthesize
outputPathstringrequiredOutput .mp3 file path
opts.topKnumber15Top-K sampling
opts.topPnumber0.7Top-P sampling
opts.temperaturenumber0.5Sampling temperature
opts.speednumber1.0Speed factor
opts.seednumber-1Random seed (-1 = random)

Returns: Promise<string> — path to the generated MP3 file.

Environment Variables

VariableDefaultDescription
GPT_SOVITS_API_URLhttp://127.0.0.1:9880GPT-SoVITS API base URL
GPT_SOVITS_API_TIMEOUT300000API request timeout (ms)

Integration

This skill is designed to be called from automation workflows:

  • Voice reply for messaging bots (WeChat, Telegram, etc.)
  • Content narration for video/audio production
  • Voice response for IVR systems

License

MIT