中文视频知识抽取器

Security checks across malware telemetry and agentic risk

Overview

This skill does what it advertises: it processes user-provided media into notes, with privacy considerations when optional LLM processing is enabled.

Use narrow, intentional inputs, especially for folders. Leave LLM_* environment variables unset for local-only processing; if you enable them, assume transcripts, source paths or URLs, and related metadata may be sent to the configured LLM provider. Install yt-dlp and whisper only from trusted sources.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (11)

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The README tells users to hand inputs directly to the skill without narrowing acceptable sources, confirmation steps, or safety boundaries. In a skill that can process URLs, playlists, local files, and folders, overly broad invocation guidance can lead to unintended activation on sensitive local content or excessive processing scope.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The README documents recursive local-folder scanning and creation of multiple output files but does not warn users about filesystem effects, search breadth, overwrite behavior, or storage location. This can cause accidental traversal of sensitive directories, unexpected disk writes, or bulk processing of private media without the user's informed consent.

Missing User Warnings

High

Confidence: 96% confidence
Finding: The README states that when LLM environment variables are set, the skill will automatically send content to an OpenAI-compatible API, but it provides no user-facing privacy or data-transfer warning. Because the skill handles video, audio, playlists, and local files, this can expose transcripts or derived content from sensitive media to external services without clear notice or consent.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The usage description encourages processing of local files and folders and generating multiple output files, but it does not clearly warn users that local content will be read recursively and new files will be written. In a skill that handles arbitrary paths, this omission can lead to unintended exposure of sensitive local data or unexpected modification of user workspaces.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: The skill enables implicit invocation with no trigger constraints, exclusions, or scoping, which increases the chance the agent will auto-select this skill for loosely related user input. Because this skill can process URLs, playlists, local file paths, and folders, unintended invocation could cause unnecessary access to user-provided local resources or external content and broaden the attack surface.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The example trigger phrases are extremely generic and overlap with ordinary user requests like summarizing a video or processing a local file/folder. In an agent ecosystem, this can cause the skill to activate unexpectedly on broad prompts, increasing the chance of unintentional local file access, recursive folder processing, or network retrieval without the user realizing the operational scope.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The documentation describes local folder input and says the script will automatically recurse through processable media files, but it does not warn users that this may traverse large directory trees and write generated artifacts to disk. That omission can lead to unintended processing of sensitive local media and unexpected creation of transcripts, summaries, and manifests containing private content.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The file states that if LLM environment variables are configured, LLM post-processing is automatically enabled, but it does not disclose that transcript or derived content may be sent to a configured external API endpoint using the provided key. This creates a privacy and data-handling risk because users may unknowingly exfiltrate sensitive audio/video-derived content to third-party services.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The script sends transcripts, source URLs/paths, and related metadata to a configurable external LLM endpoint whenever LLM integration is enabled, with no consent prompt, warning, or redaction step. Because media transcripts may contain sensitive business, personal, or copyrighted information, this creates a real confidentiality and privacy risk rather than a purely informational issue.

Ssd 1

Medium

Confidence: 93% confidence
Finding: Untrusted transcript content from audio/video is inserted directly into LLM prompts, so spoken or embedded text can instruct the model to ignore prior directions, alter output structure, or exfiltrate/transcribe unintended content. In this skill, the risk is heightened because the whole purpose is to process adversarially controlled media, making prompt-injection through transcript text a realistic attack path.

Ssd 1

Medium

Confidence: 95% confidence
Finding: The full transcript is passed verbatim to the LLM for final note generation, enabling prompt injection at the most influential stage of processing. Since the model is expected to return structured JSON used for downstream outputs, injected transcript text can degrade integrity, cause policy bypass within the summarization step, or produce manipulated knowledge artifacts.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal