Video Skill

Security checks across malware telemetry and agentic risk

Overview

This video-processing skill is functionally coherent, but it can send sensitive video, transcript, and frame data to configured AI services and its local model Docker setup exposes unauthenticated services broadly by default.

Review before installing if you will process private, proprietary, or regulated videos. Prefer local or approved provider endpoints, avoid sensitive media unless authorized, and bind Docker services to localhost or add authentication/firewall controls before starting the model stack.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (8)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 92% confidence
Finding: The skill instructs the agent to invoke shell commands, read and write local files, access configuration and environment-dependent provider settings, and perform network calls, but it declares no corresponding permissions. This creates a capability/permission mismatch that can bypass user and platform expectations, increasing the risk of unintended command execution, filesystem modification, or outbound connectivity when the skill is used.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The README instructs users to configure and use external transcription, reasoning, and vision-language model endpoints to process video-derived content, but it does not clearly warn that transcripts, frames, and other extracted data may be transmitted to third-party or remote services. This can lead to unintended disclosure of sensitive audio, imagery, or metadata if users assume processing is local or do not understand which providers receive their data.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The transcribe command sends the source video to an external transcription provider, but the CLI does not present an explicit warning, consent gate, or privacy notice at the point of use. Users may unintentionally upload sensitive visual/audio content or embedded metadata off-host, which is a real data-exposure risk even if it is expected functionality.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: In ai mode, transcript chunks are sent to a configured reasoning provider without an explicit warning that potentially sensitive spoken content will leave the local environment. Transcript text can contain credentials, personal data, or proprietary instructions, so silent remote submission creates a meaningful confidentiality and compliance risk.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The ai and ai-direct enrichment modes can send step data and frame-derived content to external reasoning/VLM services, but the CLI does not clearly notify the user at execution time. Because frame data may reveal faces, screens, documents, or other sensitive visual information, this increases privacy, confidentiality, and regulatory exposure.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: This code sends local image frame data to an external vision model by converting files to data URLs and passing them into run_structured_with_images, but there is no visible consent, disclosure, redaction, or policy gate at this layer. If the frames contain sensitive on-screen content, faces, credentials, private documents, or proprietary UI, the pipeline can exfiltrate that data to a third-party provider during normal operation.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The code sends structured step content such as instruction text, intent, expected outcome, timestamps, and raw judgement context to external model providers without any visible notice or minimization in this component. Even if this is functionally intended, tutorial data may contain sensitive business process details, proprietary workflows, or personal information, creating a data exposure risk when processed by remote LLM services.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The code serializes raw transcript chunk text and sends it to an external AI provider via `run_structured`, creating a data-exposure boundary without any visible consent, minimization, or redaction controls in this file. Because transcripts may contain sensitive spoken content, credentials, personal data, or proprietary information, forwarding them to a third-party model can violate privacy expectations and compliance requirements if not explicitly disclosed and governed.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal