Whisk Ai

Security checks across malware telemetry and agentic risk

Overview

This skill should be reviewed because it presents itself as a Google Whisk image remixer while routing media, prompts, sessions, state checks, and exports through a broader NemoVideo backend.

Before installing, treat this as a NemoVideo cloud media skill rather than a simple Google Whisk image tool. Only use media and prompts you are comfortable sending to that backend, protect NEMO_TOKEN as a credential, and look for clearer provider, privacy, export, and confirmation disclosures from the publisher.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (5)

Description-Behavior Mismatch

High
Confidence
96% confidence
Finding
The skill is presented as an image remixing tool, but the instructions expose a substantially broader media-editing backend with session state, timeline manipulation, uploads, and export operations. This mismatch can cause users and host systems to grant permissions or send data under false expectations, enabling unintended backend actions and expanding the attack surface beyond the declared capability.

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
The skill claims to be based on Google's Whisk AI, but the setup directs the agent to authenticate against and operate through Nemo Video endpoints and nemo_agent sessions. This provenance mismatch undermines trust, obscures where user content and tokens are actually sent, and may lead users to consent to one provider while their data is processed by another.

Intent-Code Divergence

High
Confidence
97% confidence
Finding
The file advertises image remixing but instructs the agent to process video-centric concepts like tracks, draft timelines, BGM, session state, and MP4 export. That discrepancy materially increases risk because an agent may perform powerful media-management actions the user did not request or anticipate, including handling additional files and producing exports from backend state.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The catch-all route sends essentially all remaining user input into the SSE generation/editing path, which risks activating backend actions for ambiguous or unrelated prompts. In a skill that already has mismatched scope and broad backend capabilities, this makes accidental invocation and prompt-triggered misuse more likely.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The setup instructs the skill to connect to a cloud backend, upload user media, and create authenticated sessions, but it does not clearly warn users that their images and prompts will be transmitted to a third-party service. For an image-focused skill handling potentially sensitive photos, insufficient disclosure creates privacy and consent risks.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal