chichi-speech (local text-to-speech service with Qwen3-TTS model)

PassAudited by VirusTotal on May 12, 2026.

Overview

Type: OpenClaw Skill Name: chichi-speech Version: 1.0.2 The skill bundle provides a FastAPI-based text-to-speech service using the Qwen3 model. All files align with the stated purpose, including loading a pre-trained model and a reference audio file from legitimate public URLs (qianwen-res.oss-cn-beijing.aliyuncs.com). The `SKILL.md` instructions are clear and do not contain any prompt injection attempts. While the server defaults to listening on `0.0.0.0` in `src/chichi_speech/server.py` and allows specifying arbitrary `--ref-audio` URLs, these are common practices for web services and core features for voice cloning, respectively, and do not indicate intentional malicious behavior like data exfiltration or unauthorized execution.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Other devices on the same reachable network might be able to request speech generation and consume local compute if the service is exposed.

Why it was flagged

The API endpoint is unauthenticated and the code default binds to all interfaces, so starting the CLI without an explicit localhost host can expose the TTS service beyond the local machine.

Skill content
parser.add_argument("--host", type=str, default="0.0.0.0", help="Service host (default: 0.0.0.0)") ... @app.post("/synthesize")
Recommendation

Run it with `--host 127.0.0.1` unless network access is intentional, and use firewalling or authentication if exposing it beyond localhost.

What this means

Installing the skill may fetch current package versions from package repositories, so behavior can vary over time and depends on the trustworthiness of those packages.

Why it was flagged

The install relies on multiple external Python packages, mostly without pinned versions. This is normal for a Python ML service, but it leaves exact dependency versions and provenance to the installer environment.

Skill content
dependencies = [
    "fastapi",
    "uvicorn",
    "requests",
    "torch",
    "soundfile",
    "pydantic",
    "qwen-tts",
    "numba>=0.59.0",
]
Recommendation

Install in a virtual environment and consider pinning or reviewing dependency versions, especially `qwen-tts`, `torch`, and related ML packages.

What this means

The first run may download or load external model assets, which can be large and whose contents are outside this artifact review.

Why it was flagged

The service loads a pretrained model from an external model identifier at startup. This is purpose-aligned for TTS, but it is an external artifact that is not included in the reviewed files.

Skill content
model = Qwen3TTSModel.from_pretrained(
        "Qwen/Qwen3-TTS-12Hz-1.7B-Base",
Recommendation

Use trusted model sources, verify model/package provenance where possible, and run in an isolated environment if concerned.