Pocket Tts

v1.0.1

Generate high-quality English speech offline on CPU using 8 built-in voices or custom voice cloning with Kyutai's Pocket TTS model.

⭐ 3· 2.1k·6 current·7 all-time

by@sherajdev

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name, SKILL.md, and code all describe a local Pocket TTS integration that loads a Kyutai model via the huggingface hub and exposes CLI and Python usage. The requested files and commands are proportionate to a TTS skill.

ℹ

Instruction Scope

SKILL.md repeatedly claims 'fully local / offline' but also notes and the code imply an initial model download (uses hf:// URIs and requires accepting a Hugging Face license). test.sh prints a hardcoded path (/home/clawdbot/...) which is environment-specific but not overtly dangerous. The CLI uses os.system('pocket-tts serve') to start a server; otherwise the instructions do not read unrelated secrets or contact unexpected endpoints beyond Hugging Face/GitHub links.

✓

Install Mechanism

There is no registry install spec; included install.sh installs standard Python packages (torch, scipy, huggingface_hub) via pip. No downloads from untrusted URLs or archive extraction are present.

✓

Credentials

The skill declares no required env vars or credentials. It does require the user to accept a gated model license on Hugging Face (may require a Hugging Face account), which is consistent with downloading a gated model. No unrelated credentials are requested.

✓

Persistence & Privilege

always is false and the skill does not request persistent platform privileges or modify other skills. It only includes local install/test scripts and a CLI.

Assessment

This skill is coherent with a local TTS tool but note: (1) despite marketing language about being 'fully offline', the first run will download the model from Hugging Face and you must accept a gated model license — that requires Internet access and possibly a Hugging Face account. (2) install.sh will pip-install torch and huggingface_hub (large packages); run in a virtualenv. (3) test.sh references a hardcoded path (/home/clawdbot/...) which is environment-specific — it’s not harmful but may fail on your machine. (4) There is a small bug risk in the included cli.py (uses wavfile.write but doesn't import wavfile alias), so expect occasional runtime errors; review/patch before deploying. If you need strict offline operation, pre-download the model and dependencies and verify the code uses local model files only. Otherwise this skill appears benign.

Like a lobster shell, security has layers — review code before you run it.

audiovk97f352x8jhja7wfbkfqqqtk557zbpaglatestvk97dffthym5jc69ppa88g845bh7zamwklocalvk97f352x8jhja7wfbkfqqqtk557zbpagofflinevk97f352x8jhja7wfbkfqqqtk557zbpagtext-to-speechvk97f352x8jhja7wfbkfqqqtk557zbpagttsvk97f352x8jhja7wfbkfqqqtk557zbpag

2.1kdownloads

3stars

2versions

Updated 1mo ago

v1.0.1

MIT-0

Pocket TTS Skill

Fully local, offline text-to-speech using Kyutai's Pocket TTS model. Generate high-quality audio from text without any API calls or internet connection. Features 8 built-in voices, voice cloning support, and runs entirely on CPU.

Features

🎯 Fully local - No API calls, runs completely offline
🚀 CPU-only - No GPU required, works on any computer
⚡ Fast generation - ~2-6x real-time on CPU
🎤 8 built-in voices - alba, marius, javert, jean, fantine, cosette, eponine, azelma
🎭 Voice cloning - Clone any voice from a WAV sample
🔊 Low latency - ~200ms first audio chunk
📚 Simple Python API - Easy integration into any project

Installation

# 1. Accept the model license on Hugging Face
# https://huggingface.co/kyutai/pocket-tts

# 2. Install the package
pip install pocket-tts

# Or use uv for automatic dependency management
uvx pocket-tts generate "Hello world"

Usage

CLI

# Basic usage
pocket-tts "Hello, I am your AI assistant"

# With specific voice
pocket-tts "Hello" --voice alba --output hello.wav

# With custom voice file (voice cloning)
pocket-tts "Hello" --voice-file myvoice.wav --output output.wav

# Adjust speed
pocket-tts "Hello" --speed 1.2

# Start local server
pocket-tts --serve

# List available voices
pocket-tts --list-voices

Python API

from pocket_tts import TTSModel
import scipy.io.wavfile

# Load model
tts_model = TTSModel.load_model()

# Get voice state
voice_state = tts_model.get_state_for_audio_prompt(
    "hf://kyutai/tts-voices/alba-mackenna/casual.wav"
)

# Generate audio
audio = tts_model.generate_audio(voice_state, "Hello world!")

# Save to WAV
scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy())

# Check sample rate
print(f"Sample rate: {tts_model.sample_rate} Hz")

Available Voices

Voice	Description
alba	Casual female voice
marius	Male voice
javert	Clear male voice
jean	Natural male voice
fantine	Female voice
cosette	Female voice
eponine	Female voice
azelma	Female voice

Or use --voice-file /path/to/wav.wav for custom voice cloning.

Options

Option	Description	Default
`text`	Text to convert	Required
`-o, --output`	Output WAV file	`output.wav`
`-v, --voice`	Voice preset	`alba`
`-s, --speed`	Speech speed (0.5-2.0)	`1.0`
`--voice-file`	Custom WAV for cloning	None
`--serve`	Start HTTP server	False
`--list-voices`	List all voices	False

Requirements

Python 3.10-3.14
PyTorch 2.5+ (CPU version works)
Works on 2 CPU cores

Notes

⚠️ Model is gated - accept license on Hugging Face first
🌍 English language only (v1)
💾 First run downloads model (~100M parameters)
🔊 Audio is returned as 1D torch tensor (PCM data)

Comments

Loading comments...