Local STT (Nvidia Parakeet + Whisper Support)

Security checks across malware telemetry and agentic risk

Overview

This skill mainly performs local speech-to-text, with a user-triggered Matrix posting option that is real but not automatic.

Install this if you want local audio transcription and are comfortable with uv fetching the listed Python/model dependencies. Do not use --room-id unless you intentionally want the recognized speech text posted to a Matrix room using MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN; use a limited Matrix token and avoid sending sensitive recordings to shared rooms.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (9)

Tainted flow: 'url' from os.environ.get (line 90, credential/environment) → requests.put (network output)

Critical

Category: Data Flow
Content: log.write(f"Attempting send to {room_id} at {txn_id}\n") log.write(f"URL: {url}\n") resp = requests.put(url, headers=headers, json=payload, timeout=10) with open("/tmp/stt_matrix.log", "a") as log: log.write(f"Response: {resp.status_code}\n")
Confidence: 97% confidence
Finding: resp = requests.put(url, headers=headers, json=payload, timeout=10)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 80% confidence
Finding: The skill advertises itself as a local STT tool, yet the detected capabilities include network, shell, environment access, and file writing without any declared permissions. Undeclared capabilities reduce transparency and can bypass user expectations, especially when combined with options that may transmit transcription data externally.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The documented purpose is local transcription, but the skill also appears to support reading local credentials, making outbound Matrix API calls, sending transcription content to a room, and writing logs to /tmp. This mismatch is dangerous because users may expose sensitive speech-derived content or credentials under the assumption that processing is purely local.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The declared purpose is local speech-to-text, but the implementation can transmit transcriptions to a Matrix room. This mismatch is dangerous because users and orchestrators may grant the skill local-only trust while it actually has data egress capability, enabling unintended disclosure of sensitive audio-derived text.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The skill reads .env files from the user's home directory to pull Matrix credentials even though that capability is outside the core local STT function. This expands the trust boundary, accesses secrets without clear need, and makes it easier for the skill to operate with hidden outbound capabilities.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: This block performs outbound HTTP requests to Matrix, which is not justified by the stated local-only purpose. In context, the capability is more dangerous because transcriptions may contain sensitive personal or business information that users expect to remain local.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The docstring emphasizes local stdout output but omits that the script may also send transcriptions remotely. This is a security-signaling problem: misleading documentation can cause unsafe deployment decisions and unintentional disclosure of sensitive transcript content.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The presence of a --room-id option implies transcriptions may be sent to a Matrix room, but the skill description does not warn users about this external disclosure path. In an STT context, transcripts often contain sensitive personal or business information, so omission of this warning materially increases the risk of unintended data leakage.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: When --room-id is supplied, the transcript is sent to Matrix without a prominent warning that recognized speech may contain sensitive data. Because STT output can include private conversations, credentials spoken aloud, or regulated data, silent transmission materially increases privacy risk.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal