Speech to Text (Yandex SpeechKit)
ReviewAudited by ClawScan on May 1, 2026.
Overview
This appears to be a legitimate speech-to-text skill, but it uses your Yandex credentials and sends audio to Yandex for transcription.
Before installing, be sure you are comfortable sending selected voice/audio files to Yandex SpeechKit. Use a least-privilege Yandex API key, store credentials in OpenClaw configuration, keep FFmpeg and Python dependencies current, and run setup/check scripts only from the installed skill directory.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Voice messages or audio files you transcribe may be processed by Yandex Cloud under your Yandex account.
The provider sends the audio bytes to Yandex SpeechKit. This is disclosed and central to the skill, but it is still an external provider data flow involving potentially sensitive voice content.
API_URL = "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize" ... response = self.session.post(... data=audio_data, timeout=self.timeout)
Use the skill only for audio you are comfortable sending to Yandex, and review Yandex SpeechKit privacy, billing, and retention terms.
The skill needs a real Yandex API key and folder ID, which may allow API usage and billing within the configured Yandex project.
The diagnostic script can read this skill's configured Yandex API key and use it to validate access against Yandex. It does not show the key in output, and this credential use is expected for SpeechKit.
OC_CONFIG="${HOME}/.openclaw/openclaw.json" ... -H "Authorization: Api-Key ${CHECK_API_KEY}"Use a least-privilege Yandex service-account key, preferably limited to SpeechKit use, and store it through OpenClaw configuration rather than pasting it into chat.
Installing and using the skill relies on local FFmpeg execution against audio files provided for transcription.
The skill runs FFmpeg as a local subprocess to inspect or convert audio. This is expected for speech-to-text processing and uses argument arrays rather than shell-string execution.
cmd = ['ffmpeg', '-i', input_file, ... '-y', output_file] ... subprocess.run(cmd, capture_output=True, text=True, timeout=300)
Keep FFmpeg updated and only transcribe files from sources you trust or are willing to process locally.
A later setup run could install newer dependency versions than the author originally tested.
The Python dependencies are specified with lower-bound ranges rather than exact pinned versions. That is common, but future installs can resolve different package versions.
python-dotenv>=1.0.0 requests>=2.31.0 urllib3>=1.26.0
Review dependencies before setup, and consider pinning versions or using a lockfile in controlled environments.
