Music Craft

Security checks across malware telemetry and agentic risk

Overview

Music Craft is a disclosed music-generation workflow with privacy-sensitive enrichment and setup steps, but no evidence of hidden, destructive, or deceptive behavior.

Install only if you are comfortable with a music workflow that may use web search/fetch, image analysis, memory lookup, local model installs, and optional cloud APIs. For private lyrics or sensitive prompts, use a local backend and ask the agent not to use web, browser, memory, or cloud providers unless you explicitly approve them.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (10)

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The skill's declared purpose is music generation, but it instructs the agent to perform broad system inspection, package-manager detection, local installs, web browsing, and memory access. That scope expansion violates least privilege and increases the chance of unnecessary access to user data, system state, and network resources beyond what is needed to generate music.

Description-Behavior Mismatch

High

Confidence: 89% confidence
Finding: The skill says cover/style-transfer workflows are out of scope, but later includes ACE-Step cover, repaint, extraction, and audio-understanding instructions. This inconsistency can cause the agent to invoke higher-risk audio-processing capabilities the user was told would not be used, defeating user expectations and safety boundaries.

Intent-Code Divergence

High

Confidence: 91% confidence
Finding: The document explicitly says audio analysis and cover workflows are not covered, then later documents those exact capabilities through ACE-Step. Contradictory instructions are dangerous because they undermine policy enforcement and may route the agent into handling user audio files and analysis tasks without proper consent or the expected safeguards.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill directs the agent to inspect durable user memory for prior preferences even though that is not necessary for basic music generation. Accessing persistent memory without a clear need or explicit consent creates a privacy risk and can expose unrelated personal data to the workflow.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The guidance tells the skill to enrich requests from images or URLs, which expands behavior beyond a pure music-generation workflow into external content interpretation. That increases attack surface because URLs and linked content can introduce untrusted data handling, privacy leakage, or unintended downstream tool use that the skill description does not clearly scope or constrain.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The markdown instructs reading durable memory without an upfront privacy warning or explicit consent checkpoint. Even if used for convenience, this weakens transparency and can surprise users by pulling prior personal context into a new task.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill encourages web fetching and browser use on user-provided links without a prominent upfront disclosure that external network access will occur. That can leak metadata, retrieve untrusted content, and expand the attack surface through remote pages or browser automation beyond user expectations.

Missing User Warnings

Low

Confidence: 79% confidence
Finding: The workflow instructs the agent to fetch third-party lyrics URLs and process their content without clearly requiring user-facing notice that external tools will access the provided URL and potentially retrieve page contents. In an agent environment, this can expose user-provided links or associated content to external fetch/browser tooling unexpectedly, creating a privacy and data-handling risk even if the content is not highly sensitive.

Natural-Language Policy Violations

Medium

Confidence: 89% confidence
Finding: The worked example hard-codes 'bright female vocal in English' for an ambiguous request ('Make me a song') without indicating that language or vocal characteristics should come from user preference. This can cause the agent to inject unrequested demographic and language defaults, leading to output misalignment and potentially inappropriate assumptions in downstream generation.

Natural-Language Policy Violations

Medium

Confidence: 83% confidence
Finding: Several sub-genre examples prescribe specific vocal languages or locales such as Spanish, Portuguese, Jamaican, or mixed-language vocals as defaults. In a music-generation skill, these defaults can steer outputs toward unrequested cultural or linguistic assumptions, which is not a code-execution risk but is still a prompt-safety and user-intent integrity issue.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal