Security audit

Douyin Video Transcribe

Security checks across malware telemetry and agentic risk

Overview

This transcription skill is plausible, but it can automatically start or create a persistent Docker Whisper service and may use configured cloud transcription paths without clear per-run consent.

Install only if you are comfortable with the agent downloading video/audio, writing local media and transcript files, running ffmpeg/ffprobe, and starting Docker containers. Prefer pre-provisioning and pinning the Whisper container yourself, avoid configuring cloud ASR keys unless you intend audio to leave the machine, and remove or stop the whisper-asr container when finished.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (12)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: self.DOCKER_IMAGE ] result = subprocess.run(cmd, capture_output=True, text=True, timeout=120) if result.returncode == 0: return True else:
Confidence: 92% confidence
Finding: result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: if container_status in ("exited", "created"): # 容器存在但未运行，启动它 print(f"🔄 启动已有容器 {self.CONTAINER_NAME}...") result = subprocess.run( ["docker", "start", self.CONTAINER_NAME], capture_output=True, text=True, timeout=30 )
Confidence: 90% confidence
Finding: result = subprocess.run( ["docker", "start", self.CONTAINER_NAME], capture_output=True, text=True, timeout=30 )

Lp3

Medium

Category: MCP Least Privilege
Confidence: 89% confidence
Finding: The skill instructs use of network access, shell commands, and file read/write behavior but does not declare any corresponding permissions or capability boundaries. That mismatch weakens user visibility and enforcement, making it easier for the skill to perform downloads, create files, and invoke local tools without clear consent or policy review.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 82% confidence
Finding: A description-behavior mismatch is security-relevant because users and reviewers may approve the skill for a narrow transcription purpose while it actually performs broader or different actions, such as container management or external service use. That breaks trust assumptions and can lead to unintended data exposure, broader execution privileges, and unsafe deployment decisions.

Description-Behavior Mismatch

Medium

Confidence: 78% confidence
Finding: The documentation expands scope from Douyin-only processing to other platforms via yt-dlp, which introduces additional network retrieval and media-handling behavior outside the declared purpose. Scope drift matters because it can bypass user expectations, expand attack surface, and trigger legal or operational risks associated with broader scraping/downloading tools.

Context-Inappropriate Capability

Medium

Confidence: 82% confidence
Finding: The skill can invoke Docker, ffmpeg, and ffprobe on the host, which expands it from pure transcription logic into host-tool orchestration. In an agent setting, that broader execution capability increases attack surface and may violate least-privilege expectations if users or operators do not realize the skill can start or rely on containerized services and parse attacker-supplied media.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The helper does more than access a local ASR endpoint: it provisions the service by starting or creating Docker containers automatically. In skill context, that materially broadens privilege and trust boundaries, allowing a transcription request to trigger software deployment and execution on the host.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: Executing Docker CLI commands to inspect, start, and create containers gives the skill host-management capability that exceeds its stated purpose of transcription. In an agent setting, this is dangerous because a seemingly harmless media-processing request can alter host state and run external software, increasing risk of misuse or unexpected persistence.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill tells the agent to download remote media and save outputs locally, but it does not warn users about file creation, storage location, or overwrite behavior. This can lead to unanticipated disk writes, accidental overwrites, or persistence of sensitive media/transcripts on the host system.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill sends audio to an HTTP ASR endpoint without a privacy warning, which is risky because spoken content may contain sensitive personal or business information. Even when the endpoint is localhost, users should be told that audio leaves the immediate transcription process and is transmitted to a service boundary for processing.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: These code paths send local audio content to third-party cloud transcription APIs whenever corresponding API keys are configured, but there is no explicit consent or warning at the call site. Because the input may contain sensitive voice data, uploading it off-device without a clear user notice creates a real privacy and data-governance risk.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The fallback logic automatically tries cloud providers after local transcription methods fail, which can silently move user audio off the local system. In a transcription skill handling potentially sensitive media, this context makes the issue more dangerous because a user asking for local transcription may unknowingly have their content uploaded to external services.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal