Security audit

Web Video Transcribe DOCX

Security checks across malware telemetry and agentic risk

Overview

This is a coherent media transcription skill that downloads user-supplied or page-discovered media, transcribes it locally, and writes local transcript/DOCX outputs.

Install only if you are comfortable with a Python workflow that may install packages, load web pages in a local browser, download media/model files, and write transcripts and manifests to disk. Use it only for public or authorized media, avoid authenticated/private pages, and avoid passing Cookie or Authorization headers manually.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (5)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 94% confidence
Finding: The skill clearly instructs execution of Python scripts that use network, filesystem, environment inspection, and shell-adjacent capabilities, but it does not declare corresponding permissions in metadata. This creates a transparency and policy-enforcement gap: an agent or marketplace may not warn users appropriately before the skill downloads content, writes files, or invokes local tooling.

Missing User Warnings

Low

Confidence: 83% confidence
Finding: The workflow persistently downloads media and generates transcript and DOCX artifacts, but the description does not explicitly warn that files will be written to disk. This is a real safety/usability issue because users may unknowingly process sensitive media or leave residual data on shared systems.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The default prompt is very broad and can cause the skill to be invoked for a wide range of web pages or media URLs without clear user-confirmed boundaries. In a security-sensitive workflow that downloads remote media and processes arbitrary web content, vague trigger conditions increase the chance of unsafe or unintended use, including fetching untrusted resources or over-applying the skill.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The script hooks Playwright request/response events and stores request headers associated with captured media URLs. Even though headers are passed through a sanitizer, this still collects potentially sensitive session-linked metadata such as Referer, Origin, or other identifying headers from arbitrary pages without any explicit notice, minimization policy, or consent prompt; in this skill context, the tool is specifically meant to extract playable media from arbitrary web pages, which increases the chance of processing authenticated or private media endpoints.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The code downloads a model archive from a remote URL and extracts it onto the local filesystem without verifying a checksum, signature, or pinned artifact digest. Although it performs a path traversal check during tar extraction, a compromised upstream release or man-in-the-middle in a less trusted environment could deliver a malicious model/archive and persist attacker-controlled files in the cache.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal