VibeVoice TTS

Local Spanish TTS using Microsoft VibeVoice. Generate natural voice audio from text, optimized for WhatsApp voice messages.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 498 · 0 current installs · 0 all-time installs

byHoddix@javier887

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (local Spanish TTS using Microsoft VibeVoice) match the provided scripts and README: it expects a local VibeVoice repo, Python + torch, and ffmpeg to produce .ogg/.mp3/.wav audio.

ℹ

Instruction Scope

SKILL.md and scripts instruct cloning the official Microsoft VibeVoice repo and creating a venv. The runtime python snippet calls from_pretrained('microsoft/VibeVoice-Realtime-0.5B') which will attempt to download model weights from Hugging Face if not present — this network activity and large download is not explicitly documented in SKILL.md. Otherwise the script stays within the TTS scope and only reads provided text and local voice .pt files.

✓

Install Mechanism

There is no automated install spec; the manual install steps clone the official GitHub repo and pip-install dependencies. This is a low-risk, expected install pattern (no obscure URLs or archives). Note: pip installing torch/torchaudio can be heavyweight and may pull CUDA-specific packages depending on environment.

✓

Credentials

The skill requests no credentials or special env vars. It uses optional env vars (VIBEVOICE_DIR, VIBEVOICE_VOICE, VIBEVOICE_SPEED) which are appropriate for configuration. No unrelated secrets or system paths are requested.

✓

Persistence & Privilege

Skill does not request always:true and does not modify other skills or system-wide settings. It's instruction-only plus a CLI script that runs locally — no elevated persistence or privilege escalations are apparent.

Assessment

This skill is internally consistent with its stated purpose, but before installing consider: (1) The runtime will likely download large model weights from Hugging Face (microsoft/VibeVoice-Realtime-0.5B) unless you already have them locally — expect heavy network use and large disk usage. (2) Installing torch/torchaudio can be large and may require CUDA/tooling matching your GPU; follow official install docs for your environment. (3) The skill runs local Python code which will execute on your machine — only install from trusted sources and inspect the VibeVoice repo you clone. (4) No credentials are required, but ensure you have sufficient GPU/VRAM, disk space, and bandwidth. If you want to be stricter, clone and verify the upstream microsoft/VibeVoice repository yourself and run the script in an isolated environment (container or dedicated VM).

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk97ecspsvvrywrj7e44een8fm5816ezy

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🎙️ Clawdis

Binsffmpeg, python3

SKILL.md

VibeVoice TTS

Local text-to-speech using Microsoft's VibeVoice model. Generates natural Spanish voice audio, perfect for WhatsApp voice messages.

Quick Start

# Basic usage
{baseDir}/scripts/vv.sh "Hola, esto es una prueba" -o /tmp/audio.ogg

# From file
{baseDir}/scripts/vv.sh -f texto.txt -o /tmp/audio.ogg

# Different voice
{baseDir}/scripts/vv.sh "Texto" -v en-Wayne -o /tmp/audio.ogg

# Adjust speed (0.5-2.0)
{baseDir}/scripts/vv.sh "Texto" -s 1.2 -o /tmp/audio.ogg

Configuration

Setting	Default	Description
Voice	`sp-Spk1_man`	Spanish male voice (slight Mexican accent)
Speed	`1.15`	15% faster than normal
Format	`.ogg`	Opus codec for WhatsApp

Available Voices

Spanish:

sp-Spk1_man - Male, slight Mexican accent (default)

English:

en-Wayne - Male
en-Denise - Female
Other voices in ~/VibeVoice/demo/voices/streaming_model/

Output Formats

.ogg - Opus codec (WhatsApp compatible, recommended)
.mp3 - MP3 format
.wav - Uncompressed WAV

For WhatsApp

Always use .ogg format with asVoice=true in the message tool:

# Generate
{baseDir}/scripts/vv.sh "Tu mensaje aquí" -o /tmp/mensaje.ogg

# Send via message tool
message action=send channel=whatsapp to="+34XXXXXXXXX" filePath=/tmp/mensaje.ogg asVoice=true

Requirements

GPU: NVIDIA with ~2GB VRAM
VibeVoice: Installed at ~/VibeVoice
ffmpeg: For audio conversion
Python 3.10+: With torch, torchaudio

Performance

RTF: ~0.24x (generates faster than realtime)
1 minute of audio ≈ 15 seconds to generate

Notes

First run loads model (~10s), subsequent runs are faster
Audio rule: Only send voice if user requests it or speaks via audio
Keep text under 1500 chars for best quality

Files

3 total

Select a file

Select a file to preview.

Comments

Loading comments…