Security audit

video2txt-视频理解字幕提取

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local transcription skill that writes subtitle/text files and may download a Whisper model, with those behaviors disclosed.

Install only if you are comfortable running local Python code, installing the listed packages, and allowing an initial Whisper model download. Choose output paths intentionally and avoid transcribing media whose plain-text transcript should not be stored locally.

SkillSpector

By NVIDIA

Vulnerability Patterns

MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (3)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 92% confidence
Finding: The skill invokes Python and writes transcription outputs to disk, but it does not declare corresponding permissions or capability requirements beyond `python3`. This creates a transparency and policy-enforcement gap: users and the platform may not realize the skill will execute shell commands, write files, and potentially trigger model downloads/network access at runtime.

Description-Behavior Mismatch

Medium

Confidence: 86% confidence
Finding: The skill is presented as operating on local media files, but WhisperModel may download model assets at runtime via download_root when the requested model is not already present locally. This creates undeclared network access and supply-chain exposure, which is relevant in restricted or privacy-sensitive environments where users may expect fully local-only processing.

Natural-Language Policy Violations

Medium

Confidence: 87% confidence
Finding: The skill defaults to Chinese transcription and automatic conversion to simplified Chinese, which can alter user data and produce outputs the user did not request. For multilingual or evidentiary/transcription use cases, silent language forcing and script normalization can degrade integrity, cause mis-transcription, and overwrite important distinctions in the source content.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal