Yandex Speechkit STT via Telegram Gateway

Security checks across malware telemetry and agentic risk

Overview

The skill performs plausible speech transcription, but an included helper can continuously process inbound voice files and forward transcripts to a hard-coded Telegram recipient without clear disclosure.

Install only if you are comfortable sending audio to Yandex SpeechKit and storing Yandex service-account credentials locally. Do not run scripts/voice_processor.py unless you first make the Telegram recipient configurable, restrict or approve monitored files, and decide whether storing transcripts in the workspace is acceptable.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (8)

Lp3

Medium
Category
MCP Least Privilege
Confidence
67% confidence
Finding
The skill documentation declares installation and usage that imply network access, shell execution, and file access, but it does not explicitly declare permissions or capabilities. This can mislead reviewers and users about the operational scope of the skill, reducing transparency and making risky behavior easier to hide or overlook.

Tp4

High
Category
MCP Tool Poisoning
Confidence
89% confidence
Finding
A mismatch between the stated purpose and the detected behavior is a significant security concern because it suggests the skill may process files continuously, retain local state, and transmit recognized text to external destinations without clear user initiation. In a voice-processing skill, such hidden automation increases the risk of unauthorized collection, processing, and exfiltration of sensitive audio-derived content.

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
The skill description says it transcribes voice messages, but the implementation also automatically forwards the resulting text to a hardcoded Telegram target. That hidden secondary behavior creates an undisclosed exfiltration path for potentially sensitive user speech, which is especially dangerous in a messaging context where voice notes often contain private information.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The documentation encourages transcription via Yandex SpeechKit but does not clearly warn that user voice data is sent to a third-party external service. Voice messages can contain sensitive personal or business information, so omitting this disclosure undermines informed consent and may create privacy, compliance, and data-handling risks.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The setup instructions tell users to store a private key directly in config.json without any security warning or safer storage guidance. Storing long-lived private keys in plaintext configuration files increases the chance of accidental leakage through source control, backups, logs, or overly broad filesystem access, which could enable unauthorized API access.

Missing User Warnings

High
Confidence
96% confidence
Finding
The code sends raw audio content to Yandex SpeechKit for transcription, but the skill metadata does not clearly disclose that user voice data leaves the local system and is processed by a third-party cloud provider. In a Telegram voice-message workflow, this is a significant privacy risk because users may reasonably expect local handling unless remote processing is explicitly explained.

Missing User Warnings

High
Confidence
98% confidence
Finding
Recognized transcript text is automatically forwarded onward to Telegram without explicit warning or approval. This creates a second-stage disclosure risk beyond cloud transcription, because the transcript may reveal sensitive spoken content and is sent to a destination the user may not know about.

Ssd 3

Medium
Confidence
97% confidence
Finding
Automatically forwarding recognized voice content into a Telegram chat creates a direct data-leakage channel for user-provided audio-derived text. In this skill context, the danger is amplified because the source material is personal voice messaging, which commonly contains sensitive conversations, names, addresses, or authentication information.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal