Install
openclaw skills install smallest-aiUltra-fast text-to-speech and speech-to-text via Smallest AI's Lightning v3.1 and Pulse models. Use when the user wants to generate speech, convert text to v...
openclaw skills install smallest-aiText-to-speech (sub-100ms) via Lightning v3.1 and speech-to-text (64ms TTFT) via Pulse.
SMALLEST_API_KEY in your environment:export SMALLEST_API_KEY="your_key_here"
sophia (American English)robert (American English)en1.024000Follow these rules to select the voice:
defaultVoiceMale.defaultVoiceFemale.defaultVoiceFemale (sophia by default).advika (female) or vivaan (male).camilla (female) or carlos (male).anitha (female) or raju (male).Always pass the configured defaultLanguage, defaultSpeed, and defaultSampleRate as --lang, --speed, and --rate flags unless the user overrides them.
Generate speech audio from text using Lightning v3.1 model.
{baseDir}/scripts/tts.sh "Text to speak" --voice sophia --rate 24000 --speed 1.0 --lang en
pip install smallestai or just requests)python3 {baseDir}/scripts/tts.py "Text to speak" --voice sophia --speed 1.0 --lang en --out speech.wav
| Voice | Gender | Accent | Best For |
|---|---|---|---|
| sophia | Female | American | General use (default) |
| robert | Male | American | Professional, reports (default) |
| advika | Female | Indian | Hindi content, code-switch |
| vivaan | Male | Indian | Bilingual English/Hindi |
| camilla | Female | Mexican/Latin | Spanish content |
| zara | Female | American | Conversational |
| melody | Female | American | Storytelling, greetings |
| arjun | Male | Indian | English/Hindi bilingual |
| stella | Female | American | Expressive, warm |
80+ more voices available. List all with: {baseDir}/scripts/voices.sh
--voice <id>: Voice identifier (default: sophia)--rate <hz>: Sample rate — 8000 | 16000 | 24000 | 44100 (default: 24000)--speed <n>: Playback speed 0.5–2.0 (default: 1.0)--lang <code>: Language code (default: en). See {baseDir}/references/languages.md--out <path>: Output file (default: auto-named media/tts_<timestamp>.wav)Scripts print MEDIA: <filepath> on success. OpenClaw sends this as an audio attachment.
Supports 30+ languages. Pass --lang with ISO code:
{baseDir}/scripts/tts.sh "नमस्ते, कैसे हैं आप?" --voice advika --lang hi
{baseDir}/scripts/tts.sh "Bonjour le monde" --voice sophia --lang fr
{baseDir}/scripts/tts.sh "Hola, buenos días" --voice camilla --lang es
Code-switching (mixing languages) works automatically — no flag needed:
{baseDir}/scripts/tts.sh "Hey, मुझे meeting remind कर दो" --voice advika --lang hi
Transcribe audio files using Pulse model. Supports WAV, MP3, OGG, FLAC.
{baseDir}/scripts/stt.sh /path/to/audio.wav
{baseDir}/scripts/stt.sh /path/to/audio.wav --diarize --timestamps --emotions
python3 {baseDir}/scripts/stt.py /path/to/audio.wav --diarize --timestamps --lang en
--lang <code>: Language (default: en)--diarize: Identify different speakers--timestamps: Word-level timing--emotions: Detect emotional toneReturns JSON with transcription field. With --diarize, includes speaker labels per word.
Trigger this skill when the user:
SMALLEST_API_KEY