Telegram Voice Transcribe

Security checks across malware telemetry and agentic risk

Overview

This skill performs Telegram voice transcription as described, but its documentation gives conflicting privacy expectations while the default workflow can upload private audio to OpenAI.

Review before installing. Use --local if voice notes must stay on your server. If using the default API workflow or the provided hook, assume Telegram audio is fetched using the bot token and uploaded to OpenAI for transcription; only enable automatic transcription where users have been informed and that data flow is acceptable.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (6)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 94% confidence
Finding: The skill documentation indicates use of environment secrets and outbound network access, but it does not declare permissions or provide a clear trust boundary for those capabilities. This can lead to unexpected credential use and external data transmission during execution, reducing visibility and making misuse harder to audit.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill sends user audio content to OpenAI for transcription and may use Telegram bot credentials to fetch files, but the description does not clearly warn about this external processing or the handling of sensitive tokens. Users may unknowingly expose private voice data or enable use of privileged bot access without informed consent.

Natural-Language Policy Violations

Medium

Confidence: 88% confidence
Finding: The instruction to always pass `--language es` hard-codes Spanish processing regardless of the actual speaker language or user preference. This can cause inaccurate transcription, misinterpret user content, and override user intent, especially in multilingual or non-Spanish contexts.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The setup recommends automatically transcribing Telegram voice/audio and prepending the transcript into agent-visible message text, but it does not warn that private audio content is being sent to a third-party transcription service and then exposed more broadly inside the agent pipeline. This can cause unintentional disclosure of sensitive voice content, especially if users do not realize their audio is being externally processed and persisted in logs, prompts, or downstream tools.

Missing User Warnings

Medium

Confidence: 80% confidence
Finding: The manual workflow instructs the agent to invoke a transcription subprocess reactively via exec, again without warning that this triggers external processing of user audio and use of local credentials such as API keys and bot tokens. While the exec target shown is a fixed local script rather than obvious command injection, the unsafe aspect is silent external data transfer and subprocess execution in response to user content without disclosure or policy guardrails.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The script defaults to the OpenAI Whisper API and also supports downloading audio from Telegram or arbitrary URLs, which means user audio may be transmitted to a third party without any explicit consent prompt or warning at execution time. In a messaging/transcription context, voice messages often contain sensitive personal data, so silent external transfer creates a real privacy and data-handling risk even if it is not an exploit in the classic code-execution sense.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal