Local TTS

v1.0.0

Local text-to-speech using Qwen3-TTS with mlx_audio (macOS Apple Silicon) or qwen-tts (Linux/Windows). Privacy-first offline TTS with natural, realistic voic...

0· 370· 1 versions· 4 current· 4 all-time· Updated 10h ago· MIT-0

Install

openclaw skills install local-tts

Local TTS with Qwen3-TTS

Privacy-First | Offline | High-Quality | Natural Real Voices

Local text-to-speech synthesis using Qwen3-TTS models. Your text never leaves your machine.

Why Local TTS?

Unlike cloud TTS (Google, AWS, Azure), local-tts ensures:

  • Zero data transmission - 100% on-device processing
  • Works offline - No network required
  • No API keys - No external dependencies
  • GDPR/HIPAA friendly - Simplified compliance

See privacy & security details.

Platform Overview

PlatformBackendInstallationBest For
macOS (Apple Silicon)mlx_audiopip install mlx-audioM1/M2/M3/M4 Macs
Linux/Windowsqwen-ttspip install qwen-ttsCUDA GPUs

Quick Start

macOS

pip install mlx-audio
brew install ffmpeg

# Natural female voice
python -m mlx_audio.tts.generate \
    --text "Hello world" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --voice Chelsie

Linux/Windows

pip install qwen-tts

# With optimizations (FlashAttention, bfloat16, auto-device)
python scripts/tts_linux.py "Hello world" --female

Key Concepts

--voice vs --instruct (Important)

Model--voice--instructNotes
CustomVoiceSelect preset voiceAdd style/emotionCan use together - voice + style control
VoiceDesignN/ACreate voice from description--instruct only
BaseN/AN/AFor voice cloning with --ref_audio

CustomVoice with style control:

python -m mlx_audio.tts.generate \
    --text "Hello there!" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --voice Serena \
    --instruct "excited and enthusiastic"

9 Preset Voices (Open Source CustomVoice)

VoiceGenderLanguageCharacter
ChelsieFemaleEnglish (American)Gentle, empathetic
SerenaFemaleEnglishWarm, gentle
Ono AnnaFemaleJapanesePlayful
SoheeFemaleKoreanWarm
AidenMaleEnglish (American)Sunny
DylanMaleEnglishNatural
EricMaleEnglishReal
RyanMaleEnglishNatural
Uncle FuMaleChineseYouthful Beijing

Defaults: Female=Serena, Male=Aiden

Usage Examples

CustomVoice (Preset Voices)

# Natural female
python -m mlx_audio.tts.generate \
    --text "Your text" --voice Serena --lang_code en \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit

# Real male
python -m mlx_audio.tts.generate \
    --text "Your text" --voice Aiden --lang_code en \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit

VoiceDesign (Text-Based)

python -m mlx_audio.tts.generate \
    --text "Hello" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-VoiceDesign-8bit \
    --instruct "A warm female voice, professional and clear"

Long Text Generation

For long text, increase --max_tokens and enable --join_audio (macOS/MLX only):

python -m mlx_audio.tts.generate \
    --text "Your very long text here..." \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --voice Serena \
    --max_tokens 4096 \
    --join_audio \
    --output long_audio.wav

Voice Cloning

python -m mlx_audio.tts.generate \
    --text "Cloned voice speaking" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit \
    --ref_audio sample.wav --ref_text "Sample transcript"

Parameters

ParameterDescriptionValues
--textText to speakRequired
--modelModel IDSee table below
--voicePreset voice (CustomVoice)Chelsie, Serena, Aiden, Ryan...
--instructVoice description (VoiceDesign) or style/emotion (CustomVoice)e.g., "excited", "calm", "professional"
--speedSpeaking rate0.5-2.0 (default: 1.0)
--pitchVoice pitch0.5-2.0 (default: 1.0)
--lang_codeLanguageen, cn, ja, ko, de, fr...
--ref_audioReference for cloningFile path
--outputOutput filePath (auto-generated if omitted)
--max_tokensMax generation tokensInteger (default: 2048) - Increase for long text
--join_audioMerge audio segmentstrue (default) or false - Recommended for long text

Models

ModelSizePurpose
Qwen3-TTS-12Hz-1.7B-CustomVoice1.7B9 preset voices + style control
Qwen3-TTS-12Hz-1.7B-VoiceDesign1.7BText-based voice creation
Qwen3-TTS-12Hz-1.7B-Base1.7BVoice cloning
Qwen3-TTS-12Hz-0.6B-*0.6BLightweight versions

macOS: Add mlx-community/ prefix (e.g., mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit)

Scripts

  • scripts/tts_macos.py - macOS wrapper
  • scripts/tts_linux.py - Linux/Windows wrapper with optimizations

Optimizations (Linux/Windows)

tts_linux.py automatically enables:

  • FlashAttention - Faster, less memory
  • bfloat16 - Better precision
  • Auto device - CUDA → CPU fallback
  • Mixed precision - Speed + quality

Troubleshooting

IssueSolution
macOS: Model not foundUse mlx-community/ prefix
macOS: Audio formatbrew install ffmpeg
Linux: CUDA OOMUse 0.6B models
Linux: SlowCheck CUDA: torch.cuda.is_available()

References

Version

1.0.0 - See VERSION and package.json

Version tags

latestvk97ba71xgcefnq15eb5whcfxdx82pb9m