Image To Video Demo

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real cloud image-to-video skill, but it automatically connects to a third-party service and may use or create tokens without clearly asking the user first.

Install only if you are comfortable with NemoVideo receiving your selected images, prompts, generated media, and session state. Avoid sensitive media, watch for use of any existing NEMO_TOKEN, and prefer explicit confirmation before the skill connects, uploads files, or starts generation.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The skill is presented as a narrow image-to-video converter, but the documented backend supports broader editing, media handling, session state inspection, and export workflows. That mismatch increases the risk of overbroad data handling and user surprise, because the skill can invoke capabilities beyond what a user would reasonably infer from the manifest.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The skill can silently mint anonymous tokens and establish authenticated remote sessions, which is a privileged network capability not obvious from the stated purpose. This can lead to undisclosed third-party data transfer and account/session creation on behalf of the user without informed consent.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: Routing 'everything else' to the SSE generate/edit action is overly broad and can cause ambiguous or unrelated user inputs to trigger remote processing. In a skill that sends prompts and files to a backend, this increases the chance of unintended actions, excess data disclosure, and misuse of external capabilities.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill explicitly instructs the agent to establish backend connections and keep the technical details out of the chat, suppressing disclosure before network requests and session/token handling occur. This undermines informed consent and can cause users to unknowingly transmit prompts, files, and metadata to a third-party service.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill encourages users to upload images and request edits/exports, but it does not clearly warn that files, prompts, and timeline state are processed on remote cloud infrastructure. For a media skill handling potentially sensitive user content, omission of this disclosure materially increases privacy and compliance risk.

Natural-Language Policy Violations

Medium

Confidence: 77% confidence
Finding: Hardcoding the initial session language to English can cause user prompts or system interactions to be processed in a language the user did not choose. This is primarily a transparency and usability issue, but it can also affect accuracy and create unintended data handling expectations for non-English users.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal