Media Caption

Security checks across malware telemetry and agentic risk

Overview

The skill appears to be a real cloud subtitle/video rendering workflow, but its broad catch-all routing can send ambiguous user requests to an external backend.

Review before installing if you handle private, client, or unreleased media. Use it only when you intend to send video files and related editing instructions to the external NemoVideo backend, and avoid invoking it for unrelated brainstorming or general chat.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (4)

Description-Behavior Mismatch

Medium
Confidence
88% confidence
Finding
The skill manifest markets a narrow captioning capability, but the body documents a much broader remote video editing, upload, session, and rendering workflow. This mismatch can mislead users and host agents about the skill's actual privileges and data flows, increasing the chance that users submit more content than expected or that the skill is invoked in contexts beyond simple captioning.

Description-Behavior Mismatch

Low
Confidence
80% confidence
Finding
The manifest frames the skill as operating on common video clips, but the documented supported formats include many additional media types such as images and audio. This scope expansion is less severe than the broader workflow mismatch, but it still weakens user consent and expectation-setting around what data the skill may accept and process remotely.

Vague Triggers

Medium
Confidence
92% confidence
Finding
Routing 'Everything else' to the SSE action creates an overly broad catch-all trigger that can capture unrelated user requests. In an agent environment, this increases the risk of unintended invocation, unnecessary transmission of user text to a third-party backend, and accidental execution of editing or session actions outside the user's informed intent.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The prompt 'Or just tell me what you're thinking' is extremely broad and overlaps with ordinary conversation, making accidental activation more likely. Because this skill immediately connects to external APIs and may upload or process user-provided media, broad trigger language increases privacy and consent risk in a meaningful way.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal