Local Whisper

Security checks across malware telemetry and agentic risk

Overview

This skill mostly matches a local speech-to-text purpose, but it makes strong local/private claims while including under-disclosed cloud transcription paths and a helper script with a code-injection risk from crafted filenames.

Review this carefully before installing. Use it only if you are comfortable auditing and forcing a local backend, and avoid setting OpenAI or Groq credentials in the environment used by the skill unless you intentionally want cloud transcription. Do not run the large-file helper on untrusted filenames until its path handling is fixed.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (17)

Lp3

Medium
Category
MCP Least Privilege
Confidence
91% confidence
Finding
The skill advertises installation and runtime behavior that use shell commands, local file writes, environment access, and network activity, yet it declares no permissions or equivalent capability disclosures. This creates a transparency and trust problem: users may enable a skill believing it is narrowly local/private while it can install packages, download large models, expose a daemon, and modify startup behavior.

Tp4

High
Category
MCP Tool Poisoning
Confidence
95% confidence
Finding
The documented purpose emphasizes private, local, Apple-Silicon-only transcription with no API costs, but the analyzed behavior indicates support for cloud transcription, generic HTTP/CLI workflows, translation, and CPU-based alternatives. This mismatch is security-relevant because users may trust the privacy claims and route sensitive voice data through the skill without realizing it may support external transmission or broader functionality than advertised.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The requirements include cloud transcription client libraries (openai and groq) even though the skill is described as private local speech-to-text using MLX Whisper. This creates a real privacy and trust-boundary risk because downstream code can route audio or transcripts to external services contrary to user expectations, especially in a messaging context involving Telegram and WhatsApp voice data.

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The /transcribe endpoint accepts JSON requests containing arbitrary local file paths and will read and process any file that exists on the host. Even though the server binds to 127.0.0.1, any local process can abuse it as a file-access proxy to probe file existence, read metadata such as size, and potentially coerce downstream decoders/transcription libraries into opening sensitive or unexpected files outside the intended Telegram/WhatsApp workflow.

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
The script is presented as local/private transcription, but it preferentially sends requests to a configurable HTTP daemon via CLAWD_WHISPER_URL. Even though the payload contains a file path rather than raw audio bytes, this still discloses sensitive local filesystem information and may cause the daemon to access user data in ways the user did not expect, especially if the URL is changed to a non-local endpoint.

Description-Behavior Mismatch

High
Confidence
95% confidence
Finding
The skill is described as local/private speech-to-text, but the implementation explicitly supports OpenAI and Groq cloud backends. In this context, that mismatch is security-relevant because users may provide sensitive voice messages expecting local-only processing, while the code can instead send audio off-device to third parties.

Context-Inappropriate Capability

High
Confidence
97% confidence
Finding
The code opens a local audio file and submits it to external OpenAI or Groq transcription APIs, which causes user audio to leave the local device. Given the skill's stated purpose of private local transcription for messaging apps, this creates a meaningful confidentiality risk and a strong chance of violating user expectations or policy requirements.

Intent-Code Divergence

Medium
Confidence
90% confidence
Finding
The module documentation frames the component as an MLX/local transcription module, but the documented support for cloud APIs contradicts that privacy-oriented positioning. This inconsistency increases the likelihood of insecure deployment and user misunderstanding about where sensitive audio data is processed.

Description-Behavior Mismatch

Medium
Confidence
89% confidence
Finding
The CLI and file header describe a local Whisper fallback, but the exposed `--backend` option explicitly allows remote providers such as `openai` and `groq`. In a privacy-focused messaging transcription skill, this mismatch can cause users or downstream automation to send sensitive voice data to third-party services unexpectedly, creating confidentiality and compliance risk.

Intent-Code Divergence

Medium
Confidence
87% confidence
Finding
The skill metadata and inline documentation emphasize private local transcription, yet the implementation supports non-local providers. This is dangerous because users may rely on the privacy claim when processing Telegram or WhatsApp audio, while the code path can route content to external services, undermining trust and potentially exposing sensitive communications.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The script sends the user-supplied audio file path to an HTTP endpoint without explicit notice or confirmation. This can leak sensitive path information such as usernames, project names, or directory structure, and if the daemon is remote or attacker-controlled, it may trigger unauthorized access attempts against local files referenced by the path.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
At the call site, the code uploads the provided audio file to an external API without any explicit user-facing warning, confirmation, or disclosure. In a skill advertised for private local transcription of Telegram and WhatsApp audio, silent transmission of message audio to third parties is particularly risky because it exposes potentially sensitive communications.

Unpinned Dependencies

Low
Category
Supply Chain
Content
# Local Whisper - Speech to text

# Core
python-dotenv>=1.0.0

# OpenAI Whisper API
openai>=1.12.0
Confidence
80% confidence
Finding
python-dotenv>=1.0.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
python-dotenv>=1.0.0

# OpenAI Whisper API
openai>=1.12.0

# Groq API (optional - fast & cheap cloud)
groq>=0.4.0
Confidence
82% confidence
Finding
openai>=1.12.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
openai>=1.12.0

# Groq API (optional - fast & cheap cloud)
groq>=0.4.0

# Local faster-whisper (optional - CPU-based)
faster-whisper>=1.0.0
Confidence
82% confidence
Finding
groq>=0.4.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
groq>=0.4.0

# Local faster-whisper (optional - CPU-based)
faster-whisper>=1.0.0

# MLX Lightning Whisper (Apple Silicon - fastest local option)
# Only works on macOS with M1/M2/M3/M4
Confidence
78% confidence
Finding
faster-whisper>=1.0.0

Known Vulnerable Dependency: python-dotenv — 1 advisory(ies): CVE-2026-28684 (python-dotenv: Symlink following in set_key allows arbitrary file overwrite via )

Low
Category
Supply Chain
Confidence
87% confidence
Finding
python-dotenv

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal