Pocket Tts
v1.0.1Generate high-quality English speech offline on CPU using 8 built-in voices or custom voice cloning with Kyutai's Pocket TTS model.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name, SKILL.md, and code all describe a local Pocket TTS integration that loads a Kyutai model via the huggingface hub and exposes CLI and Python usage. The requested files and commands are proportionate to a TTS skill.
Instruction Scope
SKILL.md repeatedly claims 'fully local / offline' but also notes and the code imply an initial model download (uses hf:// URIs and requires accepting a Hugging Face license). test.sh prints a hardcoded path (/home/clawdbot/...) which is environment-specific but not overtly dangerous. The CLI uses os.system('pocket-tts serve') to start a server; otherwise the instructions do not read unrelated secrets or contact unexpected endpoints beyond Hugging Face/GitHub links.
Install Mechanism
There is no registry install spec; included install.sh installs standard Python packages (torch, scipy, huggingface_hub) via pip. No downloads from untrusted URLs or archive extraction are present.
Credentials
The skill declares no required env vars or credentials. It does require the user to accept a gated model license on Hugging Face (may require a Hugging Face account), which is consistent with downloading a gated model. No unrelated credentials are requested.
Persistence & Privilege
always is false and the skill does not request persistent platform privileges or modify other skills. It only includes local install/test scripts and a CLI.
Assessment
This skill is coherent with a local TTS tool but note: (1) despite marketing language about being 'fully offline', the first run will download the model from Hugging Face and you must accept a gated model license — that requires Internet access and possibly a Hugging Face account. (2) install.sh will pip-install torch and huggingface_hub (large packages); run in a virtualenv. (3) test.sh references a hardcoded path (/home/clawdbot/...) which is environment-specific — it’s not harmful but may fail on your machine. (4) There is a small bug risk in the included cli.py (uses wavfile.write but doesn't import wavfile alias), so expect occasional runtime errors; review/patch before deploying. If you need strict offline operation, pre-download the model and dependencies and verify the code uses local model files only. Otherwise this skill appears benign.Like a lobster shell, security has layers — review code before you run it.
audiolatestlocalofflinetext-to-speechtts
Pocket TTS Skill
Fully local, offline text-to-speech using Kyutai's Pocket TTS model. Generate high-quality audio from text without any API calls or internet connection. Features 8 built-in voices, voice cloning support, and runs entirely on CPU.
Features
- 🎯 Fully local - No API calls, runs completely offline
- 🚀 CPU-only - No GPU required, works on any computer
- ⚡ Fast generation - ~2-6x real-time on CPU
- 🎤 8 built-in voices - alba, marius, javert, jean, fantine, cosette, eponine, azelma
- 🎭 Voice cloning - Clone any voice from a WAV sample
- 🔊 Low latency - ~200ms first audio chunk
- 📚 Simple Python API - Easy integration into any project
Installation
# 1. Accept the model license on Hugging Face
# https://huggingface.co/kyutai/pocket-tts
# 2. Install the package
pip install pocket-tts
# Or use uv for automatic dependency management
uvx pocket-tts generate "Hello world"
Usage
CLI
# Basic usage
pocket-tts "Hello, I am your AI assistant"
# With specific voice
pocket-tts "Hello" --voice alba --output hello.wav
# With custom voice file (voice cloning)
pocket-tts "Hello" --voice-file myvoice.wav --output output.wav
# Adjust speed
pocket-tts "Hello" --speed 1.2
# Start local server
pocket-tts --serve
# List available voices
pocket-tts --list-voices
Python API
from pocket_tts import TTSModel
import scipy.io.wavfile
# Load model
tts_model = TTSModel.load_model()
# Get voice state
voice_state = tts_model.get_state_for_audio_prompt(
"hf://kyutai/tts-voices/alba-mackenna/casual.wav"
)
# Generate audio
audio = tts_model.generate_audio(voice_state, "Hello world!")
# Save to WAV
scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy())
# Check sample rate
print(f"Sample rate: {tts_model.sample_rate} Hz")
Available Voices
| Voice | Description |
|---|---|
| alba | Casual female voice |
| marius | Male voice |
| javert | Clear male voice |
| jean | Natural male voice |
| fantine | Female voice |
| cosette | Female voice |
| eponine | Female voice |
| azelma | Female voice |
Or use --voice-file /path/to/wav.wav for custom voice cloning.
Options
| Option | Description | Default |
|---|---|---|
text | Text to convert | Required |
-o, --output | Output WAV file | output.wav |
-v, --voice | Voice preset | alba |
-s, --speed | Speech speed (0.5-2.0) | 1.0 |
--voice-file | Custom WAV for cloning | None |
--serve | Start HTTP server | False |
--list-voices | List all voices | False |
Requirements
- Python 3.10-3.14
- PyTorch 2.5+ (CPU version works)
- Works on 2 CPU cores
Notes
- ⚠️ Model is gated - accept license on Hugging Face first
- 🌍 English language only (v1)
- 💾 First run downloads model (~100M parameters)
- 🔊 Audio is returned as 1D torch tensor (PCM data)
Links
Comments
Loading comments...
