Smallest Ai

Ultra-fast text-to-speech and speech-to-text via Smallest AI's Lightning v3.1 and Pulse models. Use when the user wants to generate speech, convert text to v...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 35 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description, scripts, and documentation all describe TTS/STT via Smallest AI and the only required credential is SMALLEST_API_KEY; required binary (curl) is appropriate for the provided curl-based scripts. No unrelated services, credentials, or binaries are requested.
Instruction Scope
SKILL.md and included scripts instruct the agent to call smallest.ai endpoints, synthesize or transcribe audio, and write local media files. The runtime instructions do not ask the agent to read unrelated system files or other environment variables; all file I/O is local (media/tmp) and aligned with the stated functionality.
Install Mechanism
There is no remote install/download step; this is an instruction+scripts skill with bundled scripts and docs. No arbitrary external archives or shortener URLs are used in install steps, lowering install-time risk.
Credentials
Only SMALLEST_API_KEY is required (declared as primaryEnv). That single credential is appropriate and expected for a third-party TTS/STT provider; no other secrets or config paths are requested.
Persistence & Privilege
The skill is not marked always:true, does not request system-wide privileges, and does not modify other skills' configs. Agent autonomous invocation remains default but is not combined with excessive privileges here.
Assessment
This package appears to be a straightforward Smallest AI TTS/STT integration and only needs your Smallest API key and curl. Before installing: (1) verify the skill's origin (source/homepage is listed as unknown here) and prefer official provider repos if available; (2) be aware that any text and audio you send will be transmitted to smallest.ai (so avoid sending sensitive secrets or private audio you don't want shared with the provider); (3) supply a least-privilege API key (not a broad organizational admin key) and monitor usage/rate limits on the provider console; and (4) if you plan to merge the PLAN.md core changes into your system, review the proposed code edits carefully since they alter core TTS provider lists and env-key resolution.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1
Download zip
latestvk97dyp639y0yvdv1s2spmys8918301z1multilingualvk97dyt8yyftkpc6qnqk7se4vbs831e0gspeechvk97dyt8yyftkpc6qnqk7se4vbs831e0gsttvk97dyt8yyftkpc6qnqk7se4vbs831e0gttsvk97dyt8yyftkpc6qnqk7se4vbs831e0gvoicevk97dyt8yyftkpc6qnqk7se4vbs831e0g

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Clawdis
Binscurl
EnvSMALLEST_API_KEY
Primary envSMALLEST_API_KEY

SKILL.md

Smallest AI — Ultra-Fast Voice Suite

Text-to-speech (sub-100ms) via Lightning v3.1 and speech-to-text (64ms TTFT) via Pulse.

Setup

  1. Get API key from https://waves.smallest.ai → click "API Key" in left panel
  2. Set SMALLEST_API_KEY in your environment:
export SMALLEST_API_KEY="your_key_here"

Defaults

  • Default female voice: sophia (American English)
  • Default male voice: robert (American English)
  • Default language: en
  • Default speed: 1.0
  • Default sample rate: 24000

Voice Selection Rules

Follow these rules to select the voice:

  1. If user explicitly names a voice (e.g. "use advika"), use that voice.
  2. If user asks for a male voice, use the configured defaultVoiceMale.
  3. If user asks for a female voice, use the configured defaultVoiceFemale.
  4. If no gender preference, use defaultVoiceFemale (sophia by default).
  5. For Hindi content: use advika (female) or vivaan (male).
  6. For Spanish content: use camilla (female) or carlos (male).
  7. For Tamil content: use anitha (female) or raju (male).

Always pass the configured defaultLanguage, defaultSpeed, and defaultSampleRate as --lang, --speed, and --rate flags unless the user overrides them.

Text-to-Speech

Generate speech audio from text using Lightning v3.1 model.

Shell (preferred — zero dependencies)

{baseDir}/scripts/tts.sh "Text to speak" --voice sophia --rate 24000 --speed 1.0 --lang en

Python (requires pip install smallestai or just requests)

python3 {baseDir}/scripts/tts.py "Text to speak" --voice sophia --speed 1.0 --lang en --out speech.wav

Voices

VoiceGenderAccentBest For
sophiaFemaleAmericanGeneral use (default)
robertMaleAmericanProfessional, reports (default)
advikaFemaleIndianHindi content, code-switch
vivaanMaleIndianBilingual English/Hindi
camillaFemaleMexican/LatinSpanish content
zaraFemaleAmericanConversational
melodyFemaleAmericanStorytelling, greetings
arjunMaleIndianEnglish/Hindi bilingual
stellaFemaleAmericanExpressive, warm

80+ more voices available. List all with: {baseDir}/scripts/voices.sh

Options

  • --voice <id>: Voice identifier (default: sophia)
  • --rate <hz>: Sample rate — 8000 | 16000 | 24000 | 44100 (default: 24000)
  • --speed <n>: Playback speed 0.5–2.0 (default: 1.0)
  • --lang <code>: Language code (default: en). See {baseDir}/references/languages.md
  • --out <path>: Output file (default: auto-named media/tts_<timestamp>.wav)

Output

Scripts print MEDIA: <filepath> on success. OpenClaw sends this as an audio attachment.

Multilingual

Supports 30+ languages. Pass --lang with ISO code:

{baseDir}/scripts/tts.sh "नमस्ते, कैसे हैं आप?" --voice advika --lang hi
{baseDir}/scripts/tts.sh "Bonjour le monde" --voice sophia --lang fr
{baseDir}/scripts/tts.sh "Hola, buenos días" --voice camilla --lang es

Code-switching (mixing languages) works automatically — no flag needed:

{baseDir}/scripts/tts.sh "Hey, मुझे meeting remind कर दो" --voice advika --lang hi

Speech-to-Text

Transcribe audio files using Pulse model. Supports WAV, MP3, OGG, FLAC.

Shell

{baseDir}/scripts/stt.sh /path/to/audio.wav
{baseDir}/scripts/stt.sh /path/to/audio.wav --diarize --timestamps --emotions

Python

python3 {baseDir}/scripts/stt.py /path/to/audio.wav --diarize --timestamps --lang en

Options

  • --lang <code>: Language (default: en)
  • --diarize: Identify different speakers
  • --timestamps: Word-level timing
  • --emotions: Detect emotional tone

Output

Returns JSON with transcription field. With --diarize, includes speaker labels per word.

When to Use

Trigger this skill when the user:

  • Asks to "say", "speak", "read aloud", or "generate speech/audio"
  • Wants a "voice message", "voice note", or "audio file"
  • Asks to "transcribe", "convert speech/audio to text"
  • Mentions "Smallest AI", "Lightning TTS", or "Pulse STT"
  • Needs fast or low-latency speech generation
  • Wants Hindi, Spanish, multilingual, or code-switched voice output
  • Asks to compare TTS providers or benchmark latency

Error Handling

  • Missing API key → tell user to set SMALLEST_API_KEY
  • HTTP 401 → invalid or expired API key
  • HTTP 429 → rate limited, wait and retry
  • HTTP 400 → check text length (max ~5000 chars per request). Split long text into chunks.
  • Empty audio → verify voice_id is valid

Limits

  • Max text per request: ~5000 characters
  • For longer text: split into sentences, synthesize each, concatenate with sox or ffmpeg
  • Free tier: 30 minutes/month of TTS
  • Basic ($5/mo): 3 hours of TTS + 1 voice clone

Files

11 total
Select a file
Select a file to preview.

Comments

Loading comments…