Back to skill

Security audit

video2txt-视频理解字幕提取

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local transcription skill that writes subtitle/text files and may download a Whisper model, with those behaviors disclosed.

Install only if you are comfortable running local Python code, installing the listed packages, and allowing an initial Whisper model download. Choose output paths intentionally and avoid transcribing media whose plain-text transcript should not be stored locally.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (3)

Lp3

Medium
Category
MCP Least Privilege
Confidence
92% confidence
Finding
The skill invokes Python and writes transcription outputs to disk, but it does not declare corresponding permissions or capability requirements beyond `python3`. This creates a transparency and policy-enforcement gap: users and the platform may not realize the skill will execute shell commands, write files, and potentially trigger model downloads/network access at runtime.

Description-Behavior Mismatch

Medium
Confidence
86% confidence
Finding
The skill is presented as operating on local media files, but WhisperModel may download model assets at runtime via download_root when the requested model is not already present locally. This creates undeclared network access and supply-chain exposure, which is relevant in restricted or privacy-sensitive environments where users may expect fully local-only processing.

Natural-Language Policy Violations

Medium
Confidence
87% confidence
Finding
The skill defaults to Chinese transcription and automatic conversion to simplified Chinese, which can alter user data and produce outputs the user did not request. For multilingual or evidentiary/transcription use cases, silent language forcing and script normalization can degrade integrity, cause mis-transcription, and overwrite important distinctions in the source content.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal