Speech to Text (Yandex SpeechKit)

Security checks across malware telemetry and agentic risk

Overview

This is a coherent speech-to-text skill, but users should know that audio sent for transcription is processed by Yandex SpeechKit.

Install only if you are comfortable sending voice messages or audio files selected for transcription to Yandex SpeechKit. Use a least-privilege Yandex service account key, prefer storing it in OpenClaw config rather than chat or logs, keep Python dependencies and FFmpeg current, and treat the configurable temp directory as owner-controlled.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (9)

Description-Behavior Mismatch

Low
Confidence
70% confidence
Finding
The code allows STT_TEMP_DIR from configuration/environment to control where directories are created, then calls mkdir(parents=True, exist_ok=True) without restricting the path to a safe base directory. If an attacker can influence configuration, this could create or reuse arbitrary filesystem locations, which broadens the skill’s file-system reach beyond temporary audio storage.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The README states that OpenClaw saves voice files locally and the skill transcribes them with Yandex SpeechKit, but it does not clearly disclose that user audio is sent to an external third-party cloud provider. This creates a privacy and consent risk because users may assume processing is local when in reality potentially sensitive voice content leaves the host environment.

Vague Triggers

Low
Confidence
76% confidence
Finding
The skill documentation says it activates automatically for any voice message from any connected messenger, which is broader than a narrowly scoped, user-confirmed trigger. In a messaging environment this can cause unintended processing of sensitive audio and increases the chance of surprise transcription and third-party disclosure.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The activation guidance is very broad: it triggers on essentially any voice message, any need to convert speech to text, and generic audio transcription requests. Overbroad routing increases the chance the skill is invoked in unintended contexts, causing unnecessary file handling, external transmission of user audio to a third-party provider, and use of sensitive API-backed functionality without clear user intent.

Unpinned Dependencies

Low
Category
Supply Chain
Content
# Основные зависимости для Telegram STT Skill
python-dotenv>=1.0.0
requests>=2.31.0
urllib3>=1.26.0
Confidence
93% confidence
Finding
python-dotenv>=1.0.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
# Основные зависимости для Telegram STT Skill
python-dotenv>=1.0.0
requests>=2.31.0
urllib3>=1.26.0
Confidence
97% confidence
Finding
requests>=2.31.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
# Основные зависимости для Telegram STT Skill
python-dotenv>=1.0.0
requests>=2.31.0
urllib3>=1.26.0
Confidence
97% confidence
Finding
urllib3>=1.26.0

Known Vulnerable Dependency: requests — 10 advisory(ies): CVE-2014-1830 (Exposure of Sensitive Information to an Unauthorized Actor in Requests); CVE-2024-47081 (Requests vulnerable to .netrc credentials leak via malicious URLs); CVE-2024-35195 (Requests `Session` object does not verify requests after making first request wi) +7 more

High
Category
Supply Chain
Confidence
92% confidence
Finding
requests

Known Vulnerable Dependency: urllib3 — 10 advisory(ies): CVE-2025-66471 (urllib3 streaming API improperly handles highly compressed data); CVE-2024-37891 (urllib3's Proxy-Authorization request header isn't stripped during cross-origin ); CVE-2026-21441 (Decompression-bomb safeguards bypassed when following HTTP redirects (streaming ) +7 more

High
Category
Supply Chain
Confidence
94% confidence
Finding
urllib3

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal