chichi-speech (local text-to-speech service with Qwen3-TTS model)
v1.0.2A RESTful service for high-quality text-to-speech using Qwen3 and specialized voice cloning. Optimized for reusing a specific voice prompt to avoid re-computation.
⭐ 1· 1.8k·1 current·1 all-time
by@hudeven
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name, description, SKILL.md and the Python sources all implement a FastAPI-based Qwen3 TTS service with voice-clone prompt reuse. The declared dependencies (qwen-tts, torch, fastapi, uvicorn, soundfile) match the code's behavior. Minor inconsistency: pyproject version (0.1.1) differs from registry version (1.0.2) and the package imports qwen_tts while pyproject lists qwen-tts — these are likely packaging/name mismatches but do not indicate additional functionality beyond TTS.
Instruction Scope
SKILL.md instructs pip install -e . and running the CLI to start the service. The server code will download model weights (via model.from_pretrained) and fetch a hardcoded reference audio URL (an Alibaba OSS URL) to precompute the voice prompt. The service initializes the model at startup and exposes a POST /synthesize endpoint that streams WAV audio back. These actions are consistent with TTS but do involve network activity (model and reference audio downloads) and preloading large binaries.
Install Mechanism
There is no platform install spec — installation is via pip install -e . per SKILL.md. That will pull heavy native dependencies (torch, numba) and qwen-tts which can download large model artifacts at runtime. No obfuscated or suspicious third‑party download URLs in code besides the public OSS reference audio and the normal model download mechanisms (from_pretrained). This is higher friction and requires substantial disk/CPU/GPU resources but not intrinsically malicious.
Credentials
The skill requests no credentials or environment variables. The code reads PORT if present (a reasonable override). No secrets or unrelated environment variables are required or accessed.
Persistence & Privilege
The CLI default binds the FastAPI app to 0.0.0.0 (publicly reachable) which can unintentionally expose the service to the network; the SKILL.md example does show 127.0.0.1 but the code uses 0.0.0.0 by default. The skill does not request always:true and does not modify other skills or global agent config, but you should be careful to run it with appropriate host/network restrictions and firewall rules.
Assessment
This code implements the stated local TTS service, but before installing/running consider: 1) Network activity — the service will download model weights and fetch a default reference audio from a public OSS URL; if you need full offline behavior provide a local reference audio and ensure the model cache is available locally. 2) Exposure — the server default host is 0.0.0.0 (public); run with --host 127.0.0.1 or firewall it if you want local-only access. 3) Resource usage — dependencies (torch, numba, qwen-tts) and model weights are large; ensure you have disk, RAM, and hardware (GPU/MPS) capacity. 4) Source trust — the skill's source is unknown and package metadata shows a minor version/name mismatch; review the code yourself if you need to trust it fully. 5) Sanity-check arguments — the CLI accepts ref-audio/ref-text overrides; prefer local files to avoid unintended remote fetches. If you want higher assurance, request a signed upstream/package source, a reproducible release (GitHub release or known registry), or run in an isolated environment (container/VM) behind a firewall.Like a lobster shell, security has layers — review code before you run it.
latestvk976y9kn8x757mhpckgh449m5980m0m0
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
