{"skill":{"slug":"local-tts","displayName":"Local TTS","summary":"Local text-to-speech using Qwen3-TTS with mlx_audio (macOS Apple Silicon) or qwen-tts (Linux/Windows). Privacy-first offline TTS with natural, realistic voic...","description":"---\nname: local-tts\ndescription: Local text-to-speech using Qwen3-TTS with mlx_audio (macOS Apple Silicon) or qwen-tts (Linux/Windows). Privacy-first offline TTS with natural, realistic voice cloning and voice design. Use for local, secure, high-quality multilingual speech synthesis.\nlicense: MIT\n---\n\n# Local TTS with Qwen3-TTS\n\n**Privacy-First | Offline | High-Quality | Natural Real Voices**\n\nLocal text-to-speech synthesis using Qwen3-TTS models. Your text never leaves your machine.\n\n## Why Local TTS?\n\nUnlike cloud TTS (Google, AWS, Azure), **local-tts** ensures:\n- **Zero data transmission** - 100% on-device processing\n- **Works offline** - No network required\n- **No API keys** - No external dependencies\n- **GDPR/HIPAA friendly** - Simplified compliance\n\nSee [privacy & security details](references/privacy_security.md).\n\n## Platform Overview\n\n| Platform | Backend | Installation | Best For |\n|----------|---------|--------------|----------|\n| macOS (Apple Silicon) | `mlx_audio` | `pip install mlx-audio` | M1/M2/M3/M4 Macs |\n| Linux/Windows | `qwen-tts` | `pip install qwen-tts` | CUDA GPUs |\n\n## Quick Start\n\n### macOS\n\n```bash\npip install mlx-audio\nbrew install ffmpeg\n\n# Natural female voice\npython -m mlx_audio.tts.generate \\\n    --text \"Hello world\" \\\n    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \\\n    --voice Chelsie\n```\n\n### Linux/Windows\n\n```bash\npip install qwen-tts\n\n# With optimizations (FlashAttention, bfloat16, auto-device)\npython scripts/tts_linux.py \"Hello world\" --female\n```\n\n## Key Concepts\n\n### `--voice` vs `--instruct` (Important)\n\n| Model | `--voice` | `--instruct` | Notes |\n|-------|-----------|--------------|-------|\n| **CustomVoice** | Select preset voice | Add style/emotion | **Can use together** - voice + style control |\n| **VoiceDesign** | N/A | Create voice from description | `--instruct` only |\n| **Base** | N/A | N/A | For voice cloning with `--ref_audio` |\n\n**CustomVoice with style control:**\n```bash\npython -m mlx_audio.tts.generate \\\n    --text \"Hello there!\" \\\n    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \\\n    --voice Serena \\\n    --instruct \"excited and enthusiastic\"\n```\n\n### 9 Preset Voices (Open Source CustomVoice)\n\n| Voice | Gender | Language | Character |\n|-------|--------|----------|-----------|\n| Chelsie | Female | English (American) | Gentle, empathetic |\n| Serena | Female | English | Warm, gentle |\n| Ono Anna | Female | Japanese | Playful |\n| Sohee | Female | Korean | Warm |\n| Aiden | Male | English (American) | Sunny |\n| Dylan | Male | English | Natural |\n| Eric | Male | English | Real |\n| Ryan | Male | English | Natural |\n| Uncle Fu | Male | Chinese | Youthful Beijing |\n\n**Defaults:** Female=`Serena`, Male=`Aiden`\n\n## Usage Examples\n\n### CustomVoice (Preset Voices)\n\n```bash\n# Natural female\npython -m mlx_audio.tts.generate \\\n    --text \"Your text\" --voice Serena --lang_code en \\\n    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit\n\n# Real male\npython -m mlx_audio.tts.generate \\\n    --text \"Your text\" --voice Aiden --lang_code en \\\n    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit\n```\n\n### VoiceDesign (Text-Based)\n\n```bash\npython -m mlx_audio.tts.generate \\\n    --text \"Hello\" \\\n    --model mlx-community/Qwen3-TTS-12Hz-1.7B-VoiceDesign-8bit \\\n    --instruct \"A warm female voice, professional and clear\"\n```\n\n### Long Text Generation\n\nFor long text, increase `--max_tokens` and enable `--join_audio` (macOS/MLX only):\n\n```bash\npython -m mlx_audio.tts.generate \\\n    --text \"Your very long text here...\" \\\n    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \\\n    --voice Serena \\\n    --max_tokens 4096 \\\n    --join_audio \\\n    --output long_audio.wav\n```\n\n### Voice Cloning\n\n```bash\npython -m mlx_audio.tts.generate \\\n    --text \"Cloned voice speaking\" \\\n    --model mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit \\\n    --ref_audio sample.wav --ref_text \"Sample transcript\"\n```\n\n## Parameters\n\n| Parameter | Description | Values |\n|-----------|-------------|--------|\n| `--text` | Text to speak | Required |\n| `--model` | Model ID | See table below |\n| `--voice` | Preset voice (CustomVoice) | Chelsie, Serena, Aiden, Ryan... |\n| `--instruct` | Voice description (VoiceDesign) or style/emotion (CustomVoice) | e.g., \"excited\", \"calm\", \"professional\" |\n| `--speed` | Speaking rate | 0.5-2.0 (default: 1.0) |\n| `--pitch` | Voice pitch | 0.5-2.0 (default: 1.0) |\n| `--lang_code` | Language | en, cn, ja, ko, de, fr... |\n| `--ref_audio` | Reference for cloning | File path |\n| `--output` | Output file | Path (auto-generated if omitted) |\n| `--max_tokens` | Max generation tokens | Integer (default: 2048) - Increase for long text |\n| `--join_audio` | Merge audio segments | `true` (default) or `false` - Recommended for long text |\n\n## Models\n\n| Model | Size | Purpose |\n|-------|------|---------|\n| `Qwen3-TTS-12Hz-1.7B-CustomVoice` | 1.7B | 9 preset voices + style control |\n| `Qwen3-TTS-12Hz-1.7B-VoiceDesign` | 1.7B | Text-based voice creation |\n| `Qwen3-TTS-12Hz-1.7B-Base` | 1.7B | Voice cloning |\n| `Qwen3-TTS-12Hz-0.6B-*` | 0.6B | Lightweight versions |\n\nmacOS: Add `mlx-community/` prefix (e.g., `mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit`)\n\n## Scripts\n\n- `scripts/tts_macos.py` - macOS wrapper\n- `scripts/tts_linux.py` - Linux/Windows wrapper with optimizations\n\n## Optimizations (Linux/Windows)\n\n`tts_linux.py` automatically enables:\n- **FlashAttention** - Faster, less memory\n- **bfloat16** - Better precision\n- **Auto device** - CUDA → CPU fallback\n- **Mixed precision** - Speed + quality\n\n## Troubleshooting\n\n| Issue | Solution |\n|-------|----------|\n| macOS: Model not found | Use `mlx-community/` prefix |\n| macOS: Audio format | `brew install ffmpeg` |\n| Linux: CUDA OOM | Use `0.6B` models |\n| Linux: Slow | Check CUDA: `torch.cuda.is_available()` |\n\n## References\n\n- [macOS Details](references/macos_mlx_audio.md)\n- [Linux/Windows Details](references/linux_windows_transformers.md)\n- [Privacy & Security](references/privacy_security.md)\n\n## Version\n\n**1.0.0** - See [VERSION](VERSION) and [package.json](package.json)\n","topics":["Privacy","Text-to-Speech","Audio"],"tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":692,"installsAllTime":26,"installsCurrent":4,"stars":0,"versions":1},"createdAt":1773256127521,"updatedAt":1778491839794},"latestVersion":{"version":"1.0.0","createdAt":1773256127521,"changelog":"Initial release: Local text-to-speech with Qwen3-TTS, supporting macOS (mlx_audio) and Linux/Windows (qwen-tts) with FlashAttention, bfloat16 optimizations. 9 natural preset voices, voice cloning, and voice design.","license":"MIT-0"},"metadata":null,"owner":{"handle":"irachex","userId":"s17abmp7bt8vcx3mszdk2dsvws885shc","displayName":"irachex","image":"https://avatars.githubusercontent.com/u/596262?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1780089843885}}