Speech to Text (Yandex SpeechKit)

ReviewAudited by ClawScan on May 1, 2026.

Overview

This appears to be a legitimate speech-to-text skill, but it uses your Yandex credentials and sends audio to Yandex for transcription.

Before installing, be sure you are comfortable sending selected voice/audio files to Yandex SpeechKit. Use a least-privilege Yandex API key, store credentials in OpenClaw configuration, keep FFmpeg and Python dependencies current, and run setup/check scripts only from the installed skill directory.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Voice messages or audio files you transcribe may be processed by Yandex Cloud under your Yandex account.

Why it was flagged

The provider sends the audio bytes to Yandex SpeechKit. This is disclosed and central to the skill, but it is still an external provider data flow involving potentially sensitive voice content.

Skill content
API_URL = "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize" ... response = self.session.post(... data=audio_data, timeout=self.timeout)
Recommendation

Use the skill only for audio you are comfortable sending to Yandex, and review Yandex SpeechKit privacy, billing, and retention terms.

What this means

The skill needs a real Yandex API key and folder ID, which may allow API usage and billing within the configured Yandex project.

Why it was flagged

The diagnostic script can read this skill's configured Yandex API key and use it to validate access against Yandex. It does not show the key in output, and this credential use is expected for SpeechKit.

Skill content
OC_CONFIG="${HOME}/.openclaw/openclaw.json" ... -H "Authorization: Api-Key ${CHECK_API_KEY}"
Recommendation

Use a least-privilege Yandex service-account key, preferably limited to SpeechKit use, and store it through OpenClaw configuration rather than pasting it into chat.

What this means

Installing and using the skill relies on local FFmpeg execution against audio files provided for transcription.

Why it was flagged

The skill runs FFmpeg as a local subprocess to inspect or convert audio. This is expected for speech-to-text processing and uses argument arrays rather than shell-string execution.

Skill content
cmd = ['ffmpeg', '-i', input_file, ... '-y', output_file] ... subprocess.run(cmd, capture_output=True, text=True, timeout=300)
Recommendation

Keep FFmpeg updated and only transcribe files from sources you trust or are willing to process locally.

What this means

A later setup run could install newer dependency versions than the author originally tested.

Why it was flagged

The Python dependencies are specified with lower-bound ranges rather than exact pinned versions. That is common, but future installs can resolve different package versions.

Skill content
python-dotenv>=1.0.0
requests>=2.31.0
urllib3>=1.26.0
Recommendation

Review dependencies before setup, and consider pinning versions or using a lockfile in controlled environments.