Faster Whisper

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local transcription skill with disclosed media download, file output, dependency install, and optional token use, but users should be careful with URL inputs and generated HTML reports.

Install only if you are comfortable with a local ML tool that installs Python dependencies and may download models. Use URL/RSS transcription only for media you intend to fetch, avoid pasting Hugging Face tokens into shared logs, choose output paths carefully because files can be overwritten, and avoid opening HTML transcript reports generated from untrusted audio or filenames until the HTML escaping issue is fixed.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (8)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
if "--update" in sys.argv:
        _venv_python = _SCRIPT_DIR.parent / ".venv" / "bin" / "python"
        if shutil.which("uv"):
            subprocess.run(
                ["uv", "pip", "install", "--python", str(_venv_python), "--upgrade", "faster-whisper"],
                check=True,
            )
Confidence
91% confidence
Finding
subprocess.run( ["uv", "pip", "install", "--python", str(_venv_python), "--upgrade", "faster-whisper"], check=True, )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
check=True,
            )
        else:
            subprocess.run(
                [str(_venv_python), "-m", "pip", "install", "--upgrade", "faster-whisper"],
                check=True,
            )
Confidence
91% confidence
Finding
subprocess.run( [str(_venv_python), "-m", "pip", "install", "--upgrade", "faster-whisper"], check=True, )

Description-Behavior Mismatch

Medium
Confidence
96% confidence
Finding
A speech-to-text skill should not normally include the ability to upgrade packages in its own environment during routine execution. This broadens the attack surface and enables unexpected environment mutation, which is especially risky for agents expected to perform narrow, deterministic tasks.

Context-Inappropriate Capability

Medium
Confidence
96% confidence
Finding
The --update option is unjustified for the stated purpose of local transcription and effectively grants package-management capability to the skill. In an agent environment, that can be abused to alter dependencies, pull unreviewed code, and undermine reproducibility and trust boundaries.

Vague Triggers

Medium
Confidence
87% confidence
Finding
The trigger list contains broad natural-language phrases such as common requests about transcripts, timestamps, chapters, or what was said. Over-broad triggers can cause accidental invocation on unrelated conversations, which is more dangerous here because the skill can read local files, fetch remote media, and create output files once selected.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The skill supports URL, YouTube, and RSS inputs that trigger external fetching/downloading, but the documentation does not clearly warn users that using these modes causes outbound network requests and may transmit request metadata to third parties. In a local-tool context, users may reasonably assume fully offline behavior from the description, making this omission more risky.

Missing User Warnings

Low
Confidence
80% confidence
Finding
The skill documents speaker-audio export and subtitle burn-in without clearly warning that these operations create new files and may overwrite user-specified paths. While expected for media tools, silent writes or overwrites can still cause data loss or unintended modification of user content.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The HTML formatter embeds file names, language values, speaker labels, and transcript text directly into HTML without escaping, which can produce stored cross-site scripting when the output is opened in a browser. Because transcript content may come from untrusted audio or remote sources, an attacker could speak or inject text that becomes active HTML/JS in the generated report.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal