Yandex Speechkit STT via Telegram Gateway

Security checks across malware telemetry and agentic risk

Overview

The skill performs plausible speech transcription, but an included helper can continuously process inbound voice files and forward transcripts to a hard-coded Telegram recipient without clear disclosure.

Install only if you are comfortable sending audio to Yandex SpeechKit and storing Yandex service-account credentials locally. Do not run scripts/voice_processor.py unless you first make the Telegram recipient configurable, restrict or approve monitored files, and decide whether storing transcripts in the workspace is acceptable.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (8)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 67% confidence
Finding: The skill documentation declares installation and usage that imply network access, shell execution, and file access, but it does not explicitly declare permissions or capabilities. This can mislead reviewers and users about the operational scope of the skill, reducing transparency and making risky behavior easier to hide or overlook.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 89% confidence
Finding: A mismatch between the stated purpose and the detected behavior is a significant security concern because it suggests the skill may process files continuously, retain local state, and transmit recognized text to external destinations without clear user initiation. In a voice-processing skill, such hidden automation increases the risk of unauthorized collection, processing, and exfiltration of sensitive audio-derived content.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The skill description says it transcribes voice messages, but the implementation also automatically forwards the resulting text to a hardcoded Telegram target. That hidden secondary behavior creates an undisclosed exfiltration path for potentially sensitive user speech, which is especially dangerous in a messaging context where voice notes often contain private information.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The documentation encourages transcription via Yandex SpeechKit but does not clearly warn that user voice data is sent to a third-party external service. Voice messages can contain sensitive personal or business information, so omitting this disclosure undermines informed consent and may create privacy, compliance, and data-handling risks.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The setup instructions tell users to store a private key directly in config.json without any security warning or safer storage guidance. Storing long-lived private keys in plaintext configuration files increases the chance of accidental leakage through source control, backups, logs, or overly broad filesystem access, which could enable unauthorized API access.

Missing User Warnings

High

Confidence: 96% confidence
Finding: The code sends raw audio content to Yandex SpeechKit for transcription, but the skill metadata does not clearly disclose that user voice data leaves the local system and is processed by a third-party cloud provider. In a Telegram voice-message workflow, this is a significant privacy risk because users may reasonably expect local handling unless remote processing is explicitly explained.

Missing User Warnings

High

Confidence: 98% confidence
Finding: Recognized transcript text is automatically forwarded onward to Telegram without explicit warning or approval. This creates a second-stage disclosure risk beyond cloud transcription, because the transcript may reveal sensitive spoken content and is sent to a destination the user may not know about.

Ssd 3

Medium

Confidence: 97% confidence
Finding: Automatically forwarding recognized voice content into a Telegram chat creates a direct data-leakage channel for user-provided audio-derived text. In this skill context, the danger is amplified because the source material is personal voice messaging, which commonly contains sensitive conversations, names, addresses, or authentication information.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal