Video Transcribe - 视频转文字

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real local transcription skill, but it can automatically install unpinned software from a third-party package mirror despite strong offline/privacy claims.

Review before installing. This skill does not show exfiltration or destructive behavior, but first use may download and install unpinned Python packages and Whisper models, including from a third-party mirror. Install dependencies yourself in a virtual environment if possible, verify the package source, and avoid running it on sensitive recordings in shared or synced folders unless you are comfortable with transcript files being saved there.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (9)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: print("") try: # 使用 pip 模块安装方式，更可靠 subprocess.check_call([ sys.executable, "-m", "pip", "install", "openai-whisper", "-i", "https://pypi.tuna.tsinghua.edu.cn/simple",
Confidence: 98% confidence
Finding: subprocess.check_call([ sys.executable, "-m", "pip", "install", "openai-whisper", "-i", "https://pypi.tuna.tsinghua.edu.cn/simple", "--break-sy

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill documents capabilities that imply reading local files, writing multiple output files, and invoking shell commands, yet it declares no permissions. This creates a transparency and consent problem: users and platforms cannot accurately assess or constrain what the skill will access or execute, especially since shell execution can expand into dependency installation and external tool invocation.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The documented behavior materially exceeds the stated purpose of a fully local, offline transcription tool by including automatic pip installation from an external mirror, model downloads, ffmpeg dependency usage, and summary generation saved to disk. This mismatch is dangerous because users may trust the privacy/offline claim while the skill performs network-enabled supply-chain actions and additional processing they did not explicitly consent to.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: Claiming the tool is fully offline and privacy-preserving while also stating that it downloads packages and models on first run is a deceptive security/property claim. Even if media content is not uploaded, network access on first run introduces privacy, integrity, and supply-chain risk that users would not expect from an 'offline' tool.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The documentation combines strong privacy and offline assurances with contradictory statements about automatic dependency and model downloads. This inconsistency can mislead users into approving execution under false assumptions, especially in environments where any unexpected network access or auto-installation is prohibited.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The skill advertises itself as fully offline/local, but it performs a network package installation from a third-party mirror during execution. This is dangerous because users may trust the privacy/offline claim and unknowingly allow network access and remote code installation, increasing supply-chain and privacy risk.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: Embedding `pip install` capability in a local transcription utility gives the skill authority beyond its stated purpose and introduces remote code retrieval at runtime. That added capability is not necessary for processing a local file and increases the attack surface substantially if dependencies are tampered with or the environment is misconfigured.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documentation describes generated output files but does not clearly warn users that full transcripts, subtitles, JSON metadata, and summary files are written alongside the source media. For a privacy-focused transcription skill, this omission can mislead users into processing sensitive recordings without realizing plaintext derivatives will persist on disk, increasing the risk of unintended disclosure to other local users, backups, sync tools, or shared directories.

Missing User Warnings

Medium

Confidence: 98% confidence
Finding: Automatically installing software without prior user confirmation is risky because it changes the system state and fetches executable code over the network unexpectedly. In a skill marketed as local/offline, this is especially dangerous because users are less likely to anticipate or monitor that behavior.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal