Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

VibeVoice TTS

Local Spanish TTS using Microsoft VibeVoice. Generate natural voice audio from text, optimized for WhatsApp voice messages.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 498 · 0 current installs · 0 all-time installs
byHoddix@javier887
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (local Spanish TTS using Microsoft VibeVoice) match the provided scripts and README: it expects a local VibeVoice repo, Python + torch, and ffmpeg to produce .ogg/.mp3/.wav audio.
Instruction Scope
SKILL.md and scripts instruct cloning the official Microsoft VibeVoice repo and creating a venv. The runtime python snippet calls from_pretrained('microsoft/VibeVoice-Realtime-0.5B') which will attempt to download model weights from Hugging Face if not present — this network activity and large download is not explicitly documented in SKILL.md. Otherwise the script stays within the TTS scope and only reads provided text and local voice .pt files.
Install Mechanism
There is no automated install spec; the manual install steps clone the official GitHub repo and pip-install dependencies. This is a low-risk, expected install pattern (no obscure URLs or archives). Note: pip installing torch/torchaudio can be heavyweight and may pull CUDA-specific packages depending on environment.
Credentials
The skill requests no credentials or special env vars. It uses optional env vars (VIBEVOICE_DIR, VIBEVOICE_VOICE, VIBEVOICE_SPEED) which are appropriate for configuration. No unrelated secrets or system paths are requested.
Persistence & Privilege
Skill does not request always:true and does not modify other skills or system-wide settings. It's instruction-only plus a CLI script that runs locally — no elevated persistence or privilege escalations are apparent.
Assessment
This skill is internally consistent with its stated purpose, but before installing consider: (1) The runtime will likely download large model weights from Hugging Face (microsoft/VibeVoice-Realtime-0.5B) unless you already have them locally — expect heavy network use and large disk usage. (2) Installing torch/torchaudio can be large and may require CUDA/tooling matching your GPU; follow official install docs for your environment. (3) The skill runs local Python code which will execute on your machine — only install from trusted sources and inspect the VibeVoice repo you clone. (4) No credentials are required, but ensure you have sufficient GPU/VRAM, disk space, and bandwidth. If you want to be stricter, clone and verify the upstream microsoft/VibeVoice repository yourself and run the script in an isolated environment (container or dedicated VM).

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97ecspsvvrywrj7e44een8fm5816ezy

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🎙️ Clawdis
Binsffmpeg, python3

SKILL.md

VibeVoice TTS

Local text-to-speech using Microsoft's VibeVoice model. Generates natural Spanish voice audio, perfect for WhatsApp voice messages.

Quick Start

# Basic usage
{baseDir}/scripts/vv.sh "Hola, esto es una prueba" -o /tmp/audio.ogg

# From file
{baseDir}/scripts/vv.sh -f texto.txt -o /tmp/audio.ogg

# Different voice
{baseDir}/scripts/vv.sh "Texto" -v en-Wayne -o /tmp/audio.ogg

# Adjust speed (0.5-2.0)
{baseDir}/scripts/vv.sh "Texto" -s 1.2 -o /tmp/audio.ogg

Configuration

SettingDefaultDescription
Voicesp-Spk1_manSpanish male voice (slight Mexican accent)
Speed1.1515% faster than normal
Format.oggOpus codec for WhatsApp

Available Voices

Spanish:

  • sp-Spk1_man - Male, slight Mexican accent (default)

English:

  • en-Wayne - Male
  • en-Denise - Female
  • Other voices in ~/VibeVoice/demo/voices/streaming_model/

Output Formats

  • .ogg - Opus codec (WhatsApp compatible, recommended)
  • .mp3 - MP3 format
  • .wav - Uncompressed WAV

For WhatsApp

Always use .ogg format with asVoice=true in the message tool:

# Generate
{baseDir}/scripts/vv.sh "Tu mensaje aquí" -o /tmp/mensaje.ogg

# Send via message tool
message action=send channel=whatsapp to="+34XXXXXXXXX" filePath=/tmp/mensaje.ogg asVoice=true

Requirements

  • GPU: NVIDIA with ~2GB VRAM
  • VibeVoice: Installed at ~/VibeVoice
  • ffmpeg: For audio conversion
  • Python 3.10+: With torch, torchaudio

Performance

  • RTF: ~0.24x (generates faster than realtime)
  • 1 minute of audio ≈ 15 seconds to generate

Notes

  • First run loads model (~10s), subsequent runs are faster
  • Audio rule: Only send voice if user requests it or speaks via audio
  • Keep text under 1500 chars for best quality

Files

3 total
Select a file
Select a file to preview.

Comments

Loading comments…