Faster Whisper

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local transcription skill with disclosed media download, file output, dependency install, and optional token use, but users should be careful with URL inputs and generated HTML reports.

Install only if you are comfortable with a local ML tool that installs Python dependencies and may download models. Use URL/RSS transcription only for media you intend to fetch, avoid pasting Hugging Face tokens into shared logs, choose output paths carefully because files can be overwritten, and avoid opening HTML transcript reports generated from untrusted audio or filenames until the HTML escaping issue is fixed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (8)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: if "--update" in sys.argv: _venv_python = _SCRIPT_DIR.parent / ".venv" / "bin" / "python" if shutil.which("uv"): subprocess.run( ["uv", "pip", "install", "--python", str(_venv_python), "--upgrade", "faster-whisper"], check=True, )
Confidence: 91% confidence
Finding: subprocess.run( ["uv", "pip", "install", "--python", str(_venv_python), "--upgrade", "faster-whisper"], check=True, )

subprocess module call

Medium

Category: Dangerous Code Execution
Content: check=True, ) else: subprocess.run( [str(_venv_python), "-m", "pip", "install", "--upgrade", "faster-whisper"], check=True, )
Confidence: 91% confidence
Finding: subprocess.run( [str(_venv_python), "-m", "pip", "install", "--upgrade", "faster-whisper"], check=True, )

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: A speech-to-text skill should not normally include the ability to upgrade packages in its own environment during routine execution. This broadens the attack surface and enables unexpected environment mutation, which is especially risky for agents expected to perform narrow, deterministic tasks.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The --update option is unjustified for the stated purpose of local transcription and effectively grants package-management capability to the skill. In an agent environment, that can be abused to alter dependencies, pull unreviewed code, and undermine reproducibility and trust boundaries.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The trigger list contains broad natural-language phrases such as common requests about transcripts, timestamps, chapters, or what was said. Over-broad triggers can cause accidental invocation on unrelated conversations, which is more dangerous here because the skill can read local files, fetch remote media, and create output files once selected.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill supports URL, YouTube, and RSS inputs that trigger external fetching/downloading, but the documentation does not clearly warn users that using these modes causes outbound network requests and may transmit request metadata to third parties. In a local-tool context, users may reasonably assume fully offline behavior from the description, making this omission more risky.

Missing User Warnings

Low

Confidence: 80% confidence
Finding: The skill documents speaker-audio export and subtitle burn-in without clearly warning that these operations create new files and may overwrite user-specified paths. While expected for media tools, silent writes or overwrites can still cause data loss or unintended modification of user content.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The HTML formatter embeds file names, language values, speaker labels, and transcript text directly into HTML without escaping, which can produce stored cross-site scripting when the output is opened in a browser. Because transcript content may come from untrusted audio or remote sources, an attacker could speak or inject text that becomes active HTML/JS in the generated report.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal