Local TTS

v1.0.0

Local text-to-speech using Qwen3-TTS with mlx_audio (macOS Apple Silicon) or qwen-tts (Linux/Windows). Privacy-first offline TTS with natural, realistic voic...

0· 267·4 current·4 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (local Qwen3-TTS via mlx_audio or qwen-tts) match the included scripts and docs. The scripts call the expected libraries (mlx_audio on macOS, qwen-tts/torch on Linux/Windows) and expose the parameters described in SKILL.md. Minor metadata mismatch: registry metadata said homepage none but package.json includes a GitHub homepage, which is a non-security inconsistency in metadata.
Instruction Scope
SKILL.md and the scripts instruct only to run local TTS generation, reference local files (ref_audio) and standard parameters. They do not attempt to read arbitrary unrelated system files or exfiltrate data. The instructions do rely on downloading models (one-time) from model hosting (Hugging Face-style identifiers), which is documented in the README and references; that initial network activity is expected but should be acknowledged by users who require strict air-gapped operation.
Install Mechanism
No install spec in the registry (instruction-only), so nothing is auto-downloaded by the platform. The code relies on pip-installable packages (mlx-audio, qwen-tts, torch, flash-attn, ffmpeg) which are reasonable for this purpose. Model weights are loaded via from_pretrained calls which will fetch artifacts from model hosts; this is expected but can involve large downloads and possibly gated models that require credentials.
Credentials
The skill requests no environment variables, no credentials, and no config paths — consistent with a local-offline TTS tool. Caveat: some Hugging Face-hosted models can be gated and would require a HUGGINGFACE_TOKEN or equivalent at download time; the skill does not declare such env vars, so users should be aware to provide tokens manually if needed. Disk, memory and GPU resource requirements (large model files, VRAM) are documented in the references.
Persistence & Privilege
Skill does not request always:true or any elevated/persistent privileges. It is a user-invocable wrapper that runs local Python code and runs subprocesses to installed libraries; this is proportional to its function.
Assessment
This skill appears to do what it says: local, offline TTS wrappers for macOS (mlx_audio) and Linux/Windows (qwen-tts). Before installing or running it: - Expect large one-time downloads for model weights (from Hugging Face-style model IDs) and significant disk/GPU usage for 1.7B models — the docs note smaller 0.6B alternatives. - If you require strict air-gapped operation, pre-download and verify models and dependencies; the scripts will call from_pretrained which normally performs network fetches. - Some model checkpoints may be gated and require a Hugging Face token (not declared by the skill); provide such credentials yourself if needed and verify the trustworthiness of the model source. - The registry metadata had minor mismatches (homepage present in package.json but registry listed none) and tests reference a VERSION file not present in the manifest — these are build/metadata inconsistencies, not direct security red flags, but you may want to confirm the repository/author (package.json points to https://github.com/irachex/local-tts). - Dependencies to install (mlx-audio, qwen-tts, torch, ffmpeg, optional flash-attn) are normal for TTS but be prepared for native builds (flash-attn) and sizeable installs. If you want to be extra cautious, review the upstream GitHub repo and the actual model sources before running the first model download.

Like a lobster shell, security has layers — review code before you run it.

latestvk97ba71xgcefnq15eb5whcfxdx82pb9m

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments