Douyin Video Analyzer

Security checks across malware telemetry and agentic risk

Overview

This video analyzer is mostly transparent about uploading media to an AI service, but it needs review because crafted inputs could run local shell commands and some temporary video files may remain on disk.

Install only if you are comfortable sending video frames and audio clips to Zhipu AI and running local media tools. Avoid untrusted links and oddly named local files until the maintainer replaces shell-string exec calls with safer argument-array execution, validates URLs/paths, and makes temporary media cleanup reliable.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (11)

Tp4

High

Category: MCP Tool Poisoning
Confidence: 92% confidence
Finding: The skill is presented as a video analysis tool, but the documented behavior includes browser automation, network interception, local media downloading, and transmission of frames/audio to an external API. That mismatch can cause users to authorize or run the skill without fully understanding that third-party data transfer and broader collection/processing occur, creating privacy and consent risks, especially for copyrighted or sensitive media.

Description-Behavior Mismatch

Low

Confidence: 93% confidence
Finding: The skill description promises analysis of video data, structure, visuals, and copy, but the code also extracts audio and performs speech-to-text transcription. That expands processing into potentially sensitive spoken content without clear disclosure, which can surprise users and cause privacy/compliance issues, especially when videos contain private speech or third-party voices.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The PRD explicitly requires downloading user-supplied video content to a local temporary directory, but it provides no user-facing notice, consent flow, or operational controls around where files are stored, how access is restricted, or how deletion is verified. This creates privacy and data-handling risk, especially if temporary files persist, are exposed to other processes, or contain sensitive or copyrighted content.

Missing User Warnings

High

Confidence: 96% confidence
Finding: The skill design sends extracted frames, audio, and transcripts to third-party AI services such as Gemini Vision, Gemini Pro, and ASR providers, yet the PRD contains no privacy warning or consent language informing users that media content may leave the local environment. This is dangerous because videos may contain personal data, biometric information, voices, or confidential content, and undisclosed transmission to external processors can violate user expectations, platform rules, or regulatory requirements.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The code sends video frame data to a third-party AI service by embedding each frame as base64 image data in the API request, but there is no visible consent, disclosure, redaction control, or privacy gate in this module. Because video frames may contain faces, private locations, documents, or other sensitive content, silent exfiltration to an external provider creates a real privacy and data-handling risk.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The code uploads extracted audio segments to a third-party transcription service at open.bigmodel.cn, but there is no visible consent, disclosure, or control in this module to warn users that potentially sensitive audio leaves the local environment. In a video-analysis skill, audio may contain personal data, confidential speech, or copyrighted content, so silent transmission to an external API creates a real privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The code builds a shell command with a user-controlled file path and executes it via child_process.exec. Although the path is wrapped in double quotes, shell metacharacters such as embedded quotes or command substitutions can still break out and lead to command injection, causing arbitrary command execution on the host.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The code constructs a shell command with untrusted inputs (`videoUrl` and `outputPath` derived from `videoId`) and executes it via `exec`, which invokes a shell. Even though values are wrapped in double quotes, shell metacharacter expansion such as command substitution can still occur, so a crafted URL or ID could lead to arbitrary command execution while downloading remote content.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The design explicitly describes downloading third-party videos, storing them under temporary local directories, extracting frames, and sending those frames to an external vision API, but it does not document any user notice, consent, retention policy, or data-handling safeguards. In a media-analysis skill, frames can contain personal data, copyrighted content, or sensitive on-screen information, so silent transfer and storage creates a real privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The spec explicitly requires sending extracted video frames to an external Zhipu API, but it does not mention user consent, disclosure, data handling limits, or privacy safeguards. Video frames can contain faces, private environments, on-screen text, or other sensitive content, so silent third-party transmission creates a real privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The script sends extracted audio and video frames to external AI/ASR services using an API key, but there is no user-facing warning, consent flow, or notice about remote processing. In this skill context, users may reasonably expect local video analysis, so undisclosed transmission of media-derived content materially increases privacy, confidentiality, and regulatory risk.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal