chichi-speech (local text-to-speech service with Qwen3-TTS model)

v1.0.2

A RESTful service for high-quality text-to-speech using Qwen3 and specialized voice cloning. Optimized for reusing a specific voice prompt to avoid re-computation.

1· 1.9k·1 current·1 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name, description, SKILL.md and the Python sources all implement a FastAPI-based Qwen3 TTS service with voice-clone prompt reuse. The declared dependencies (qwen-tts, torch, fastapi, uvicorn, soundfile) match the code's behavior. Minor inconsistency: pyproject version (0.1.1) differs from registry version (1.0.2) and the package imports qwen_tts while pyproject lists qwen-tts — these are likely packaging/name mismatches but do not indicate additional functionality beyond TTS.
Instruction Scope
SKILL.md instructs pip install -e . and running the CLI to start the service. The server code will download model weights (via model.from_pretrained) and fetch a hardcoded reference audio URL (an Alibaba OSS URL) to precompute the voice prompt. The service initializes the model at startup and exposes a POST /synthesize endpoint that streams WAV audio back. These actions are consistent with TTS but do involve network activity (model and reference audio downloads) and preloading large binaries.
Install Mechanism
There is no platform install spec — installation is via pip install -e . per SKILL.md. That will pull heavy native dependencies (torch, numba) and qwen-tts which can download large model artifacts at runtime. No obfuscated or suspicious third‑party download URLs in code besides the public OSS reference audio and the normal model download mechanisms (from_pretrained). This is higher friction and requires substantial disk/CPU/GPU resources but not intrinsically malicious.
Credentials
The skill requests no credentials or environment variables. The code reads PORT if present (a reasonable override). No secrets or unrelated environment variables are required or accessed.
!
Persistence & Privilege
The CLI default binds the FastAPI app to 0.0.0.0 (publicly reachable) which can unintentionally expose the service to the network; the SKILL.md example does show 127.0.0.1 but the code uses 0.0.0.0 by default. The skill does not request always:true and does not modify other skills or global agent config, but you should be careful to run it with appropriate host/network restrictions and firewall rules.
Assessment
This code implements the stated local TTS service, but before installing/running consider: 1) Network activity — the service will download model weights and fetch a default reference audio from a public OSS URL; if you need full offline behavior provide a local reference audio and ensure the model cache is available locally. 2) Exposure — the server default host is 0.0.0.0 (public); run with --host 127.0.0.1 or firewall it if you want local-only access. 3) Resource usage — dependencies (torch, numba, qwen-tts) and model weights are large; ensure you have disk, RAM, and hardware (GPU/MPS) capacity. 4) Source trust — the skill's source is unknown and package metadata shows a minor version/name mismatch; review the code yourself if you need to trust it fully. 5) Sanity-check arguments — the CLI accepts ref-audio/ref-text overrides; prefer local files to avoid unintended remote fetches. If you want higher assurance, request a signed upstream/package source, a reproducible release (GitHub release or known registry), or run in an isolated environment (container/VM) behind a firewall.

Like a lobster shell, security has layers — review code before you run it.

latestvk976y9kn8x757mhpckgh449m5980m0m0
1.9kdownloads
1stars
3versions
Updated 1mo ago
v1.0.2
MIT-0

Chichi Speech Service

This skill provides a FastAPI-based REST service for Qwen3 TTS, specifically configured for reusing a high-quality reference audio prompt for efficient and consistent voice cloning. This service is packaged as an installable CLI.

Installation

Prerequisites: python >= 3.10.

pip install -e .

Usage

1. Start the Service

The service runs on port 9090 by default.

# Start the server (runs in foreground, use & for background or a separate terminal)
# Optional: Uudate to your own reference audio and text for voice cloning
chichi-speech --port 9090 --host 127.0.0.1 --ref-audio "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav" --ref-text "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."

2. Verify Service is Running

Check the health/docs:

curl http://localhost:9090/docs

3. Generate Speech

Use cURL:

curl -X POST "http://localhost:9090/synthesize" \
     -H "Content-Type: application/json" \
     -d '{
           "text": "Nice to meet you",
           "language": "English"
         }' \
     --output output/nice_to_meet.wav

Functionality

  • Endpoint: POST /synthesize
  • Default Port: 9090
  • Voice Cloning: Uses a pre-computed voice prompt from reference files to ensure the cloned voice is consistent and generation is fast.

Requirements

  • Python 3.10+
  • qwen-tts (Qwen3 model library)
  • Access to a reference audio file for voice cloning.
    • By default, it uses public sample audio from Qwen3.
    • CRITICAL: You can provide your own reference audio using the --ref-audio and --ref-text flags.

Comments

Loading comments...