视频字幕提取与总结（B站+抖音+YouTube+小红书）

Security checks across malware telemetry and agentic risk

Overview

This skill’s video downloading, transcription, summary, frame extraction, and local subtitle storage behavior is disclosed and aligned with its stated purpose.

Install only if you are comfortable with a video-processing skill contacting supported video platforms, running local media tools, and keeping transcripts or extracted artifacts on disk. Use it deliberately on private or sensitive videos, review optional flags such as batch, frames, danmaku, and diarize before running, and independently verify any credibility assessment about authors or people mentioned in videos.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (8)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill invokes shell commands, accesses local files, writes outputs, and makes network requests, yet declares no permissions or user-facing guardrails. That mismatch can cause the agent to perform sensitive actions without explicit consent or enforcement, especially when handling arbitrary URLs and local paths.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 91% confidence
Finding: The documented behavior extends well beyond simple subtitle extraction and summarization into collection crawling, screenshot extraction, subtitle database management, diarization, danmaku analysis, and artifact export. This broader operational scope increases the chance of unexpected network access, storage of user data, and unintended activation of powerful features the user did not request.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The skill instructs the agent to perform background investigation on authors and other people, which expands from content summarization into profiling and credibility assessment of individuals. This creates privacy, defamation, and hallucination risks, especially when the agent is told to infer background from partial identifiers or memory rather than verified sources.

Description-Behavior Mismatch

Medium

Confidence: 82% confidence
Finding: The pipeline exposes batch collection crawling, keyframe extraction, danmaku analysis, and diarization beyond the skill's stated role of subtitle extraction and summarization. This scope expansion increases data collection and processing of user-supplied content, making the skill more dangerous because it can enumerate and persist substantially more information than a user may reasonably expect from a summarizer.

Context-Inappropriate Capability

Medium

Confidence: 84% confidence
Finding: The skill orchestrates multiple helper scripts and external binaries, giving it generalized execution capability well beyond a narrow summarization task. In the context of an agent skill, that broader execution surface matters because compromised helpers or unsafe downstream behavior can turn a content-processing tool into a vehicle for unexpected code execution paths and data handling.

Description-Behavior Mismatch

Medium

Confidence: 83% confidence
Finding: The script accepts arbitrary paths for --json-file, --output, and --db, enabling reads from and writes to attacker-influenced filesystem locations. In an agent context, if untrusted input can reach these arguments, the skill could be abused as a generic file read/write primitive outside its stated purpose, which increases the risk of unauthorized local file access or data tampering.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: The trigger list includes broad everyday phrases that may fire the skill in contexts where the user did not intend video downloading, transcription, or local-file processing. Over-broad activation is dangerous here because the skill can perform network fetches, invoke external tools, and persist outputs once triggered.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill writes subtitles, JSON results, exports, and a persistent subtitle database to local storage, but does not prominently warn users about retention, file locations, or sensitive-content handling. Users may unknowingly persist copyrighted, personal, or confidential audio-derived text on disk, increasing privacy and compliance risk.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal