Speech to Text (Yandex SpeechKit)

Security checks across malware telemetry and agentic risk

Overview

This is a coherent speech-to-text skill, but users should know that audio sent for transcription is processed by Yandex SpeechKit.

Install only if you are comfortable sending voice messages or audio files selected for transcription to Yandex SpeechKit. Use a least-privilege Yandex service account key, prefer storing it in OpenClaw config rather than chat or logs, keep Python dependencies and FFmpeg current, and treat the configurable temp directory as owner-controlled.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (9)

Description-Behavior Mismatch

Low

Confidence: 70% confidence
Finding: The code allows STT_TEMP_DIR from configuration/environment to control where directories are created, then calls mkdir(parents=True, exist_ok=True) without restricting the path to a safe base directory. If an attacker can influence configuration, this could create or reuse arbitrary filesystem locations, which broadens the skill’s file-system reach beyond temporary audio storage.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The README states that OpenClaw saves voice files locally and the skill transcribes them with Yandex SpeechKit, but it does not clearly disclose that user audio is sent to an external third-party cloud provider. This creates a privacy and consent risk because users may assume processing is local when in reality potentially sensitive voice content leaves the host environment.

Vague Triggers

Low

Confidence: 76% confidence
Finding: The skill documentation says it activates automatically for any voice message from any connected messenger, which is broader than a narrowly scoped, user-confirmed trigger. In a messaging environment this can cause unintended processing of sensitive audio and increases the chance of surprise transcription and third-party disclosure.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The activation guidance is very broad: it triggers on essentially any voice message, any need to convert speech to text, and generic audio transcription requests. Overbroad routing increases the chance the skill is invoked in unintended contexts, causing unnecessary file handling, external transmission of user audio to a third-party provider, and use of sensitive API-backed functionality without clear user intent.

Unpinned Dependencies

Low

Category: Supply Chain
Content: # Основные зависимости для Telegram STT Skill python-dotenv>=1.0.0 requests>=2.31.0 urllib3>=1.26.0
Confidence: 93% confidence
Finding: python-dotenv>=1.0.0

Unpinned Dependencies

Low

Category: Supply Chain
Content: # Основные зависимости для Telegram STT Skill python-dotenv>=1.0.0 requests>=2.31.0 urllib3>=1.26.0
Confidence: 97% confidence
Finding: requests>=2.31.0

Unpinned Dependencies

Low

Category: Supply Chain
Content: # Основные зависимости для Telegram STT Skill python-dotenv>=1.0.0 requests>=2.31.0 urllib3>=1.26.0
Confidence: 97% confidence
Finding: urllib3>=1.26.0

Known Vulnerable Dependency: requests — 10 advisory(ies): CVE-2014-1830 (Exposure of Sensitive Information to an Unauthorized Actor in Requests); CVE-2024-47081 (Requests vulnerable to .netrc credentials leak via malicious URLs); CVE-2024-35195 (Requests `Session` object does not verify requests after making first request wi) +7 more

High

Category: Supply Chain
Confidence: 92% confidence
Finding: requests

Known Vulnerable Dependency: urllib3 — 10 advisory(ies): CVE-2025-66471 (urllib3 streaming API improperly handles highly compressed data); CVE-2024-37891 (urllib3's Proxy-Authorization request header isn't stripped during cross-origin ); CVE-2026-21441 (Decompression-bomb safeguards bypassed when following HTTP redirects (streaming ) +7 more

High

Category: Supply Chain
Confidence: 94% confidence
Finding: urllib3

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal