Video Caption

Security checks across malware telemetry and agentic risk

Overview

The skill is a real cloud video-captioning workflow, but its instructions give a remote media service broader editing and upload authority than the caption-focused description makes clear.

Install only if you intend to use NemoVideo cloud processing and are comfortable sending your videos, URLs, prompts, and edits to that service. Use a dedicated or revocable NEMO_TOKEN, avoid sensitive/private media unless you trust the provider, and treat ambiguous non-caption editing requests carefully because the skill's backend routing is broad.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (5)

Description-Behavior Mismatch

Medium
Confidence
89% confidence
Finding
The skill is presented as a narrow video-captioning tool, but the documented API surface and workflows support broader media editing, generation, export, and timeline manipulation. This scope expansion increases the chance the agent will route unrelated user requests into a powerful remote editing backend without clear user understanding or authorization boundaries.

Context-Inappropriate Capability

Medium
Confidence
92% confidence
Finding
Allowing uploads by arbitrary URL introduces SSRF-style and data-exfiltration risk because the backend may be induced to fetch attacker-chosen resources rather than user-supplied local media. In a captioning skill, remote URL ingestion is not clearly necessary, so this capability materially broadens the attack surface beyond the stated purpose.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The invocation guidance includes broad phrases like 'describe what you're after,' which can cause over-triggering on generic creative or editing requests. In an agent ecosystem, overly loose matching can activate this skill outside its intended scope and send user content to the remote service unexpectedly.

Vague Triggers

Medium
Confidence
95% confidence
Finding
Routing 'Everything else' to the SSE action creates an effectively unbounded fallback path into a remote backend. This is dangerous because ambiguous or unrelated prompts may still be processed as editing/generation commands, expanding data exposure and enabling unintended backend actions without clear scope checks.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The skill text states that rendering happens server-side but does not clearly warn that uploaded media, prompts, and session data are sent to third-party remote processing endpoints. For a media skill handling potentially sensitive videos, incomplete disclosure undermines informed consent and increases privacy and compliance risk.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal