Caption Generator For Video

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real cloud video-captioning tool, but it needs review because it can automatically create remote sessions and send files, URLs, and broad editing prompts to a third-party backend with limited user-facing disclosure.

Install only if you are comfortable sending videos, media URLs, prompts, and generated outputs to NemoVideo's cloud service. Avoid confidential, regulated, or third-party-sensitive media unless the provider's privacy and retention terms are acceptable, and prefer an intentionally provided NEMO_TOKEN over silent anonymous token creation.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (5)

Description-Behavior Mismatch

Medium
Confidence
91% confidence
Finding
The skill is presented as a captioning tool, but the documentation expands its scope into broader video editing, media manipulation, session state inspection, and export workflows. This mismatch can cause the agent to route unrelated media-editing requests to a remote backend and perform actions the user did not reasonably expect from a captioning skill, increasing the risk of over-collection and unintended remote processing.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
Supporting URL-based ingestion allows the skill to fetch remote media from arbitrary locations, which is not necessary for basic user-supplied captioning. This can be abused to cause the backend to retrieve attacker-controlled or internal URLs, creating SSRF-style risk, unexpected third-party data transfer, or ingestion of content the user did not directly upload.

Vague Triggers

Medium
Confidence
92% confidence
Finding
The catch-all routing rule sends nearly any unmatched request to the SSE action, effectively broadening the skill beyond clearly defined intents. In practice, this can make the agent hand off arbitrary user prompts to the backend, increasing the chance of unintended remote actions, data exposure, or abuse of backend capabilities unrelated to captioning.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The skill instructs users to drop video files into chat and states that processing occurs on cloud GPUs, but it does not provide a clear, user-facing consent step explaining that files will be uploaded to and processed by a third-party remote service. This creates a privacy and data-handling risk, especially for sensitive or proprietary video content.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The skill automatically acquires tokens and creates backend sessions, including anonymous account-like access, without a clear notice or consent flow. Hidden network calls and silent remote session creation can surprise users, leak metadata, and create external accounts or identifiers on their behalf without informed agreement.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal