Video Frames Free

Security checks across malware telemetry and agentic risk

Overview

This skill is framed as frame extraction, but it actually gives an agent broad cloud video upload, editing, session, and MP4 export behavior that users may not expect.

Install only if you are comfortable treating this as a general cloud video editor/export connector, not just a frame extractor. Do not upload sensitive videos unless you trust NemoVideo's handling of media, prompts, tokens, session state, and retention.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (7)

Description-Behavior Mismatch

High
Confidence
96% confidence
Finding
The skill is advertised as a simple frame-extraction utility, but the embedded instructions implement a substantially broader cloud video editing and rendering pipeline. This capability mismatch is dangerous because it can cause users or the host agent to upload media, create remote sessions, and trigger editing/export actions they did not reasonably expect from the declared skill scope.

Description-Behavior Mismatch

Medium
Confidence
94% confidence
Finding
The description promises extracted frame images, but the body repeatedly describes MP4 video render outputs instead. This inconsistency can mislead users into submitting content under false assumptions about processing and outputs, undermining informed consent and increasing the chance of unintended remote media handling.

Context-Inappropriate Capability

High
Confidence
95% confidence
Finding
The skill includes unrelated capabilities such as text overlays, audio/BGM handling, timeline editing, and generic editing intent routing, which exceed what is needed for frame extraction. Overbroad capabilities increase attack surface and make it easier for the skill to be invoked for unrelated actions, causing unauthorized or surprising processing of user media.

Context-Inappropriate Capability

Medium
Confidence
90% confidence
Finding
Automatically obtaining anonymous tokens and creating persistent remote sessions is broader than necessary for a simple file transformation skill. This is dangerous because it silently establishes backend identity and state, enabling network transmission and ongoing server-side processing before the user has clearly consented to such activity.

Intent-Code Divergence

Medium
Confidence
91% confidence
Finding
The title and examples frame the skill as image extraction, while the documented workflows describe generic editing and MP4 export behavior. Contradictory user-facing messaging increases the risk of deceptive or accidental use, especially when handling personal video files that may be transmitted to a remote service.

Vague Triggers

Medium
Confidence
88% confidence
Finding
The catch-all routing rule sends 'everything else' to the SSE editing pathway, which can unintentionally capture unrelated user prompts. In context, that means benign or ambiguous user requests may trigger remote backend interaction and broader actions than the user intended.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The skill instructs the agent to automatically connect to the backend and acquire an anonymous token without a clear user warning that network transmission will occur. For a media-processing skill, this is particularly risky because uploaded videos and associated metadata may be sent off-device before the user meaningfully understands the privacy implications.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal