Smallest Ai

Other

Ultra-fast text-to-speech and speech-to-text via Smallest AI's Lightning v3.1 and Pulse models. Use when the user wants to generate speech, convert text to voice, read text aloud, create voice notes, transcribe audio to text, or clone a voice. Sub-100ms latency TTS. 64ms TTFT STT. Supports 30+ languages including Hindi and Spanish. Voices include sophia, robert, advika, vivaan, camilla, and 80+ more.

Install

openclaw skills install smallest-ai

Smallest AI — Ultra-Fast Voice Suite

Text-to-speech (sub-100ms) via Lightning v3.1 and speech-to-text (64ms TTFT) via Pulse.

Setup

Get API key from https://waves.smallest.ai → click "API Key" in left panel
Set SMALLEST_API_KEY in your environment:

export SMALLEST_API_KEY="your_key_here"

Defaults

Default female voice: sophia (American English)
Default male voice: robert (American English)
Default language: en
Default speed: 1.0
Default sample rate: 24000

Voice Selection Rules

Follow these rules to select the voice:

If user explicitly names a voice (e.g. "use advika"), use that voice.
If user asks for a male voice, use the configured defaultVoiceMale.
If user asks for a female voice, use the configured defaultVoiceFemale.
If no gender preference, use defaultVoiceFemale (sophia by default).
For Hindi content: use advika (female) or vivaan (male).
For Spanish content: use camilla (female) or carlos (male).
For Tamil content: use anitha (female) or raju (male).

Always pass the configured defaultLanguage, defaultSpeed, and defaultSampleRate as --lang, --speed, and --rate flags unless the user overrides them.

Text-to-Speech

Generate speech audio from text using Lightning v3.1 model.

Shell (preferred — zero dependencies)

{baseDir}/scripts/tts.sh "Text to speak" --voice sophia --rate 24000 --speed 1.0 --lang en

Python (requires `pip install smallestai` or just `requests`)

python3 {baseDir}/scripts/tts.py "Text to speak" --voice sophia --speed 1.0 --lang en --out speech.wav

Voices

Voice	Gender	Accent	Best For
sophia	Female	American	General use (default)
robert	Male	American	Professional, reports (default)
advika	Female	Indian	Hindi content, code-switch
vivaan	Male	Indian	Bilingual English/Hindi
camilla	Female	Mexican/Latin	Spanish content
zara	Female	American	Conversational
melody	Female	American	Storytelling, greetings
arjun	Male	Indian	English/Hindi bilingual
stella	Female	American	Expressive, warm

80+ more voices available. List all with: {baseDir}/scripts/voices.sh

Options

--voice <id>: Voice identifier (default: sophia)
--rate <hz>: Sample rate — 8000 | 16000 | 24000 | 44100 (default: 24000)
--speed <n>: Playback speed 0.5–2.0 (default: 1.0)
--lang <code>: Language code (default: en). See {baseDir}/references/languages.md
--out <path>: Output file (default: auto-named media/tts_<timestamp>.wav)

Output

Scripts print MEDIA: <filepath> on success. OpenClaw sends this as an audio attachment.

Multilingual

Supports 30+ languages. Pass --lang with ISO code:

{baseDir}/scripts/tts.sh "नमस्ते, कैसे हैं आप?" --voice advika --lang hi
{baseDir}/scripts/tts.sh "Bonjour le monde" --voice sophia --lang fr
{baseDir}/scripts/tts.sh "Hola, buenos días" --voice camilla --lang es

Code-switching (mixing languages) works automatically — no flag needed:

{baseDir}/scripts/tts.sh "Hey, मुझे meeting remind कर दो" --voice advika --lang hi

Speech-to-Text

Transcribe audio files using Pulse model. Supports WAV, MP3, OGG, FLAC.

Shell

{baseDir}/scripts/stt.sh /path/to/audio.wav
{baseDir}/scripts/stt.sh /path/to/audio.wav --diarize --timestamps --emotions

Python

python3 {baseDir}/scripts/stt.py /path/to/audio.wav --diarize --timestamps --lang en

Options

--lang <code>: Language (default: en)
--diarize: Identify different speakers
--timestamps: Word-level timing
--emotions: Detect emotional tone

Output

Returns JSON with transcription field. With --diarize, includes speaker labels per word.

When to Use

Trigger this skill when the user:

Asks to "say", "speak", "read aloud", or "generate speech/audio"
Wants a "voice message", "voice note", or "audio file"
Asks to "transcribe", "convert speech/audio to text"
Mentions "Smallest AI", "Lightning TTS", or "Pulse STT"
Needs fast or low-latency speech generation
Wants Hindi, Spanish, multilingual, or code-switched voice output
Asks to compare TTS providers or benchmark latency

Error Handling

Missing API key → tell user to set SMALLEST_API_KEY
HTTP 401 → invalid or expired API key
HTTP 429 → rate limited, wait and retry
HTTP 400 → check text length (max ~5000 chars per request). Split long text into chunks.
Empty audio → verify voice_id is valid

Limits

Max text per request: ~5000 characters
For longer text: split into sentences, synthesize each, concatenate with sox or ffmpeg
Free tier: 30 minutes/month of TTS
Basic ($5/mo): 3 hours of TTS + 1 voice clone

Smallest Ai

Install

Smallest AI — Ultra-Fast Voice Suite

Setup

Defaults

Voice Selection Rules

Text-to-Speech

Shell (preferred — zero dependencies)

Python (requires pip install smallestai or just requests)

Voices

Options

Output

Multilingual

Speech-to-Text

Shell

Python

Options

Output

When to Use

Error Handling

Limits

Python (requires `pip install smallestai` or just `requests`)