Whisk Ai

Security checks across malware telemetry and agentic risk

Overview

This skill should be reviewed because it presents itself as a Google Whisk image remixer while routing media, prompts, sessions, state checks, and exports through a broader NemoVideo backend.

Before installing, treat this as a NemoVideo cloud media skill rather than a simple Google Whisk image tool. Only use media and prompts you are comfortable sending to that backend, protect NEMO_TOKEN as a credential, and look for clearer provider, privacy, export, and confirmation disclosures from the publisher.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (5)

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The skill is presented as an image remixing tool, but the instructions expose a substantially broader media-editing backend with session state, timeline manipulation, uploads, and export operations. This mismatch can cause users and host systems to grant permissions or send data under false expectations, enabling unintended backend actions and expanding the attack surface beyond the declared capability.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The skill claims to be based on Google's Whisk AI, but the setup directs the agent to authenticate against and operate through Nemo Video endpoints and nemo_agent sessions. This provenance mismatch undermines trust, obscures where user content and tokens are actually sent, and may lead users to consent to one provider while their data is processed by another.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The file advertises image remixing but instructs the agent to process video-centric concepts like tracks, draft timelines, BGM, session state, and MP4 export. That discrepancy materially increases risk because an agent may perform powerful media-management actions the user did not request or anticipate, including handling additional files and producing exports from backend state.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The catch-all route sends essentially all remaining user input into the SSE generation/editing path, which risks activating backend actions for ambiguous or unrelated prompts. In a skill that already has mismatched scope and broad backend capabilities, this makes accidental invocation and prompt-triggered misuse more likely.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The setup instructs the skill to connect to a cloud backend, upload user media, and create authenticated sessions, but it does not clearly warn users that their images and prompts will be transmitted to a third-party service. For an image-focused skill handling potentially sensitive photos, insufficient disclosure creates privacy and consent risks.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal