Video Transcribe - 视频转文字

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real local transcription skill, but it can automatically install unpinned software from a third-party package mirror despite strong offline/privacy claims.

Review before installing. This skill does not show exfiltration or destructive behavior, but first use may download and install unpinned Python packages and Whisper models, including from a third-party mirror. Install dependencies yourself in a virtual environment if possible, verify the package source, and avoid running it on sensitive recordings in shared or synced folders unless you are comfortable with transcript files being saved there.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (9)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
print("")
    try:
        # 使用 pip 模块安装方式,更可靠
        subprocess.check_call([
            sys.executable, "-m", "pip", "install", 
            "openai-whisper",
            "-i", "https://pypi.tuna.tsinghua.edu.cn/simple",
Confidence
98% confidence
Finding
subprocess.check_call([ sys.executable, "-m", "pip", "install", "openai-whisper", "-i", "https://pypi.tuna.tsinghua.edu.cn/simple", "--break-sy

Lp3

Medium
Category
MCP Least Privilege
Confidence
93% confidence
Finding
The skill documents capabilities that imply reading local files, writing multiple output files, and invoking shell commands, yet it declares no permissions. This creates a transparency and consent problem: users and platforms cannot accurately assess or constrain what the skill will access or execute, especially since shell execution can expand into dependency installation and external tool invocation.

Tp4

High
Category
MCP Tool Poisoning
Confidence
96% confidence
Finding
The documented behavior materially exceeds the stated purpose of a fully local, offline transcription tool by including automatic pip installation from an external mirror, model downloads, ffmpeg dependency usage, and summary generation saved to disk. This mismatch is dangerous because users may trust the privacy/offline claim while the skill performs network-enabled supply-chain actions and additional processing they did not explicitly consent to.

Description-Behavior Mismatch

Medium
Confidence
91% confidence
Finding
Claiming the tool is fully offline and privacy-preserving while also stating that it downloads packages and models on first run is a deceptive security/property claim. Even if media content is not uploaded, network access on first run introduces privacy, integrity, and supply-chain risk that users would not expect from an 'offline' tool.

Intent-Code Divergence

Medium
Confidence
90% confidence
Finding
The documentation combines strong privacy and offline assurances with contradictory statements about automatic dependency and model downloads. This inconsistency can mislead users into approving execution under false assumptions, especially in environments where any unexpected network access or auto-installation is prohibited.

Description-Behavior Mismatch

High
Confidence
99% confidence
Finding
The skill advertises itself as fully offline/local, but it performs a network package installation from a third-party mirror during execution. This is dangerous because users may trust the privacy/offline claim and unknowingly allow network access and remote code installation, increasing supply-chain and privacy risk.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
Embedding `pip install` capability in a local transcription utility gives the skill authority beyond its stated purpose and introduces remote code retrieval at runtime. That added capability is not necessary for processing a local file and increases the attack surface substantially if dependencies are tampered with or the environment is misconfigured.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The documentation describes generated output files but does not clearly warn users that full transcripts, subtitles, JSON metadata, and summary files are written alongside the source media. For a privacy-focused transcription skill, this omission can mislead users into processing sensitive recordings without realizing plaintext derivatives will persist on disk, increasing the risk of unintended disclosure to other local users, backups, sync tools, or shared directories.

Missing User Warnings

Medium
Confidence
98% confidence
Finding
Automatically installing software without prior user confirmation is risky because it changes the system state and fetches executable code over the network unexpectedly. In a skill marketed as local/offline, this is especially dangerous because users are less likely to anticipate or monitor that behavior.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal