Ai Music Generator

Security checks across malware telemetry and agentic risk

Overview

The skill is a coherent cloud media tool, but it connects automatically and has overly broad routing that can send ambiguous prompts to a third-party backend.

Review before installing. Use this only if you are comfortable sending videos, images, audio, prompts, and render/session state to NemoVideo's cloud service. Avoid confidential or rights-sensitive media unless you have verified the service's privacy, retention, and billing terms, and treat NEMO_TOKEN like a password.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The manifest presents the skill as a narrow AI music generator, but the documented behavior exposes a broader remote video-editing and media-processing pipeline including uploads, state inspection, rendering, and export. This capability mismatch can mislead users and host platforms about what data is sent off-device and what operations are performed, increasing the risk of overbroad authorization and unintended remote processing.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The invocation phrase "generate my video or images" is broad enough to match ordinary conversation and could trigger the skill unintentionally. Because this skill automatically connects to a backend and is designed to upload/process user media, accidental activation can lead to unanticipated remote actions on sensitive files or content.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The example phrase "generate upbeat background music that matches" is underspecified and may be interpreted in unrelated chat contexts, especially when combined with catch-all routing later in the file. In this skill, vague triggers are more dangerous because they can initiate remote session setup and media-processing workflows without a clearly scoped user request.

Vague Triggers

Medium

Confidence: 95% confidence
Finding: The catch-all rule routes nearly all non-matched user input into the SSE editing/generation backend, creating an overbroad trigger surface. In a skill that can upload media, maintain session state, and execute remote edits/exports, this greatly increases the likelihood of unintended backend actions and data disclosure from normal conversation.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill instructs automatic connection to a remote backend on first open but does not provide a clear upfront user warning in the description or consent step. Automatic background connection is risky here because the skill handles user media and can begin remote session establishment before the user fully understands that data and requests are being sent to a third-party service.

Missing User Warnings

Low

Confidence: 90% confidence
Finding: The skill obtains authentication tokens, creates persistent session state, and stores the returned session identifier for subsequent requests, but this is not clearly disclosed to the user. Hidden token/session handling increases the risk that users do not understand the persistence, scope, and privacy implications of their interactions with the remote service.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal