IMA AI Music Generator — Suno, DouBao

Security checks across malware telemetry and agentic risk

Overview

The skill appears to be a real IMA music-generation integration, but its published voiceover/narration description conflicts with the actual music workflow and its API-key handling is broader than the strongest disclosure claims.

Review this carefully before installing. It is best understood as an AI music/song/BGM generator, not a voiceover or narration tool. Only use it if you are comfortable sending prompts and your IMA_API_KEY to IMA's API and potentially spending IMA credits; avoid using the --base-url option unless you fully trust the endpoint.

SkillSpector

By NVIDIA

Vulnerability Patterns

Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (11)

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The keywords file broadly advertises AI music generation, songwriting, jingles, beats, and lyrics while the manifest describes a narrower voiceover/narration capability. This scope mismatch is dangerous because it can cause the agent to activate for requests outside the declared skill boundary, misleading users and reviewers and potentially bypassing governance, consent, or policy checks tied to the stated purpose.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The text explicitly states that the skill is for music generation rather than TTS, directly contradicting the manifest's voiceover/narration-focused purpose. A direct contradiction like this is a strong indicator of deceptive or poorly governed scope, increasing the chance that the skill will be routed to unsupported or unreviewed behaviors.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The skill metadata promises voiceovers, narration, and spoken audio, but the file documents a different capability: text-to-music/song generation. This kind of capability mismatch is dangerous because users and downstream systems may authorize or invoke the skill under false assumptions, leading to unauthorized actions, incorrect data handling, billing surprises, or policy bypass based on misleading classification.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The documented models, task types, and API flow are explicitly for `text_to_music`, not voiceover or narration. In context, this is more dangerous because the skill is marketed as spoken-audio generation, so users may submit text or trust outputs under the wrong safety, copyright, and cost expectations while the system actually sends prompts to a music-generation backend.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: Labeling the document as 'IMA Voice AI' while it actually describes music generation is misleading and contributes to deceptive or unsafe operator behavior. Although the title alone is not code execution, it materially increases the risk of misuse and concealment when combined with the broader capability mismatch in the file.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The manifest says the skill generates spoken audio, while the operational instructions are for music generation. Security-wise, this is dangerous because users and auditors may authorize the skill under false assumptions about data handling, content generation, and third-party processing.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The manifest advertises voiceover, narration, and spoken-audio generation, but the rest of the file consistently describes a music-generation skill. This mismatch can mislead users and downstream systems about the skill’s capabilities, causing inappropriate installation, data sharing, or workflow use under false assumptions. In a security context, deceptive or inaccurate manifests reduce trust boundaries and can mask what a skill actually does.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The invocation phrasing is so broad that ordinary requests about music, songs, BGM, or audio creation could trigger this skill without clear boundaries. Overbroad triggering can hijack unrelated user intents, invoke premium/API-backed actions unexpectedly, and make it harder to enforce least-privilege behavior for the skill.

Natural-Language Policy Violations

Medium

Confidence: 84% confidence
Finding: Forcing `x_app_language: en` without user choice can cause the service to process requests under an unintended locale, which may affect output accuracy, transparency, and user expectations. While not a high-severity exploit primitive, it is a privacy/consent and policy concern because it overrides user context silently.

Unpinned Dependencies

Low

Category: Supply Chain
Content: # Python dependencies for ima-voice-ai skill # Install with: pip install -r requirements.txt requests>=2.25.0
Confidence: 93% confidence
Finding: requests>=2.25.0

Known Vulnerable Dependency: requests — 10 advisory(ies): CVE-2014-1830 (Exposure of Sensitive Information to an Unauthorized Actor in Requests); CVE-2024-47081 (Requests vulnerable to .netrc credentials leak via malicious URLs); CVE-2024-35195 (Requests `Session` object does not verify requests after making first request wi) +7 more

High

Category: Supply Chain
Confidence: 95% confidence
Finding: requests

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal