Ffmpeg Video To Mp3

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real NemoVideo cloud skill, but it is much broader than a simple video-to-MP3 converter and needs careful review before use.

Review before installing. Use only if you are comfortable sending videos, URLs, prompts, and session data to NemoVideo, and treat generated claim links or tokens as sensitive credentials. Avoid sensitive or proprietary media unless you trust the service and understand its retention and access controls.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (8)

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The skill is presented as a narrow video-to-MP3 converter, but its routing instructions send many user requests into broad NemoVideo editing, generation, upload, state, and export workflows. This capability mismatch can cause users and host agents to disclose files, prompts, and actions to a much more powerful remote service than the skill description implies, increasing the risk of unintended data exposure and misuse.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The API section exposes a full remote media-editing platform, including session management, SSE messaging, rendering, state queries, and generalized media handling, which is far broader than simple MP3 extraction. A host may grant this skill more trust than warranted based on its benign title, allowing covert use of remote editing features under the guise of conversion.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The skill advertises support for a short list of video containers, but the documented backend accepts images and standalone audio formats as well. This discrepancy widens the effective input surface and may lead agents to upload content types users did not expect to share with this skill.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill instructs the agent to provision anonymous tokens and maintain a persistent client identifier in local config, even though the described task is simple file conversion. This introduces unnecessary authentication handling and persistent tracking state, which expands privacy and abuse risk beyond the user’s likely expectations.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: Allowing URL-based ingestion means the skill can fetch remote media instead of only processing user-supplied local files. That creates unnecessary capability for a conversion skill and can be abused to pull third-party or sensitive content into the remote backend without clear user understanding.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The workflow maps export requests to an export section, but the documented export operation produces MP4 video output rather than MP3 audio. This is a direct functional mismatch that can cause users to unknowingly trigger broader video rendering behavior and upload/edit pipelines inconsistent with the stated purpose.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The API narrative claims the backend extracts audio to MP3, but the concrete export endpoint is defined to render MP4 output. This inconsistency indicates the skill may be masking a generic media-rendering service behind an audio-conversion label, which raises the chance of deceptive behavior and unintended processing of user media.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The setup instructions send tokens, messages, and uploaded media to a remote NemoVideo backend, but the skill description does not clearly warn users that their content is processed server-side. This undermines informed consent and may expose sensitive audio/video data to third-party infrastructure unexpectedly.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal