Install
openclaw skills install sergei-mikhailov-sttSpeech recognition from voice messages using Yandex SpeechKit (with an extensible architecture for other providers). Use when you need to convert a voice mes...
openclaw skills install sergei-mikhailov-sttThis skill recognizes speech from voice messages sent via any messenger connected to OpenClaw, using various STT providers, including Yandex SpeechKit.
Use this skill when:
Example path from OpenClaw:
/home/user_folder/.openclaw/media/inbound/file_1---9a53bac2-0392-41e7-8300-1c08e8eec027.ogg
~/.openclaw/openclaw.json or .env manually.openclaw.json, .env, or config.json without explicit user permission. These files contain credentials and must only be changed by the owner.Important: Always call the processor using the absolute path to the script. Do not use cd <skill_dir> && python3 scripts/... — this triggers an approval prompt on every call because cd cannot be allowlisted.
python3 /path/to/sergei-mikhailov-stt/scripts/stt_processor.py --file "/path/to/audio.ogg"
The script resolves all paths (config, .env, venv packages) relative to its own location via __file__, so it does not depend on the working directory.
clawhub install sergei-mikhailov-stt
cd ~/.openclaw/workspace/skills/sergei-mikhailov-stt
bash setup.sh
The setup script creates a Python virtual environment, installs dependencies, and copies example configuration files. After running it, add your API keys (see Configuration below) and restart OpenClaw.
On Debian/Ubuntu, you may need to install the venv package first:
sudo apt install python3-venv
To verify that everything is configured correctly, run the diagnostic script:
bash check.sh
It checks Python, FFmpeg, virtual environment, dependencies, and API keys — and tells you exactly what to fix if something is missing.
Add credentials to ~/.openclaw/openclaw.json:
{
"skills": {
"entries": {
"sergei-mikhailov-stt": {
"env": {
"YANDEX_API_KEY": "your_api_key_here",
"YANDEX_FOLDER_ID": "your_folder_id_here"
}
}
}
}
}
.env fileEdit the .env file created by setup.sh in the skill folder:
YANDEX_API_KEY=your_api_key_here
YANDEX_FOLDER_ID=your_folder_id_here
STT_DEFAULT_PROVIDER=yandex
openclaw gateway stop && openclaw gateway start
The config.json file (also created by setup.sh) lets you tune provider parameters:
{
"default_provider": "yandex",
"providers": {
"yandex": {
"api_key": "${YANDEX_API_KEY}",
"folder_id": "${YANDEX_FOLDER_ID}",
"lang": "ru-RU"
}
}
}
# scripts/providers/new_provider.py
from .base_provider import BaseSTTProvider
class NewProvider(BaseSTTProvider):
name = "new_provider"
def recognize(self, audio_file_path: str, language: str = 'ru-RU') -> str:
# Recognition implementation
pass
def validate_config(self, config: dict) -> bool:
# Configuration validation
pass
def get_supported_formats(self) -> list:
return ['ogg', 'wav', 'mp3']
Add to scripts/stt_processor.py in the _get_provider method:
if provider_name == 'new_provider':
return NewProvider(provider_config)
Add the new provider section to config.json:
{
"providers": {
"new_provider": {
"api_key": "${NEW_PROVIDER_API_KEY}",
"model": "latest"
}
}
}
User: [sends a voice message]
OpenClaw: Recognized text: "Hello, how are you?"
User: Transcribe this English voice message
OpenClaw: Recognized text (en-US): "Hello, how are you today?"
User: Analyze this voice message
OpenClaw: Recognized text: "Meeting tomorrow at 3 PM"
Language: ru-RU
Confidence: 95%
Provider: Yandex SpeechKit
When the skill returns an error, explain it to the user in plain language and suggest a concrete next step. Do not show raw error messages or stack traces.
| Error | Say to the user | Next step |
|---|---|---|
File too large | "The voice message is too long — maximum is about 30 seconds for now." | Ask them to send a shorter message |
Unsupported format | "This audio format is not supported." | Tell them supported formats: OGG, WAV, MP3, M4A, FLAC, AAC |
API key invalid / HTTP 401 | "There's a problem with the Yandex SpeechKit API key." | Ask owner to check YANDEX_API_KEY in openclaw.json |
Folder access denied / HTTP 403 | "Access to Yandex SpeechKit is denied." | Ask owner to verify the service account has ai.speechkit.user role |
Too many requests / HTTP 429 | "Yandex SpeechKit is rate-limiting us right now." | Try again in a few seconds |
FFmpeg not found | "Audio conversion tool (FFmpeg) is not installed on the server." | Owner needs to run brew install ffmpeg or apt install ffmpeg |
API request timed out | "Yandex SpeechKit did not respond in time." | Try again; if it repeats, the service may be down |
Missing YANDEX_API_KEY | "The skill is not configured yet — API keys are missing." | Owner needs to add keys to ~/.openclaw/openclaw.json |
~/.openclaw/openclaw.jsonffmpeg -versionai.speechkit.useropenclaw logsOn successful recognition:
{
"text": "Recognized text",
"language": "ru-RU",
"confidence": 0.95,
"provider": "yandex",
"processing_time": 2.5
}