Local STT (Nvidia Parakeet + Whisper Support)

Security checks across malware telemetry and agentic risk

Overview

This skill mainly performs local speech-to-text, with a user-triggered Matrix posting option that is real but not automatic.

Install this if you want local audio transcription and are comfortable with uv fetching the listed Python/model dependencies. Do not use --room-id unless you intentionally want the recognized speech text posted to a Matrix room using MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN; use a limited Matrix token and avoid sending sensitive recordings to shared rooms.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (9)

Tainted flow: 'url' from os.environ.get (line 90, credential/environment) → requests.put (network output)

Critical
Category
Data Flow
Content
log.write(f"Attempting send to {room_id} at {txn_id}\n")
            log.write(f"URL: {url}\n")

        resp = requests.put(url, headers=headers, json=payload, timeout=10)

        with open("/tmp/stt_matrix.log", "a") as log:
            log.write(f"Response: {resp.status_code}\n")
Confidence
97% confidence
Finding
resp = requests.put(url, headers=headers, json=payload, timeout=10)

Lp3

Medium
Category
MCP Least Privilege
Confidence
80% confidence
Finding
The skill advertises itself as a local STT tool, yet the detected capabilities include network, shell, environment access, and file writing without any declared permissions. Undeclared capabilities reduce transparency and can bypass user expectations, especially when combined with options that may transmit transcription data externally.

Tp4

High
Category
MCP Tool Poisoning
Confidence
95% confidence
Finding
The documented purpose is local transcription, but the skill also appears to support reading local credentials, making outbound Matrix API calls, sending transcription content to a room, and writing logs to /tmp. This mismatch is dangerous because users may expose sensitive speech-derived content or credentials under the assumption that processing is purely local.

Description-Behavior Mismatch

High
Confidence
95% confidence
Finding
The declared purpose is local speech-to-text, but the implementation can transmit transcriptions to a Matrix room. This mismatch is dangerous because users and orchestrators may grant the skill local-only trust while it actually has data egress capability, enabling unintended disclosure of sensitive audio-derived text.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
The skill reads .env files from the user's home directory to pull Matrix credentials even though that capability is outside the core local STT function. This expands the trust boundary, accesses secrets without clear need, and makes it easier for the skill to operate with hidden outbound capabilities.

Context-Inappropriate Capability

High
Confidence
97% confidence
Finding
This block performs outbound HTTP requests to Matrix, which is not justified by the stated local-only purpose. In context, the capability is more dangerous because transcriptions may contain sensitive personal or business information that users expect to remain local.

Intent-Code Divergence

Medium
Confidence
94% confidence
Finding
The docstring emphasizes local stdout output but omits that the script may also send transcriptions remotely. This is a security-signaling problem: misleading documentation can cause unsafe deployment decisions and unintentional disclosure of sensitive transcript content.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The presence of a --room-id option implies transcriptions may be sent to a Matrix room, but the skill description does not warn users about this external disclosure path. In an STT context, transcripts often contain sensitive personal or business information, so omission of this warning materially increases the risk of unintended data leakage.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
When --room-id is supplied, the transcript is sent to Matrix without a prominent warning that recognized speech may contain sensitive data. Because STT output can include private conversations, credentials spoken aloud, or regulated data, silent transmission materially increases privacy risk.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal