Caption Generator For Video

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real cloud video-captioning tool, but it needs review because it can automatically create remote sessions and send files, URLs, and broad editing prompts to a third-party backend with limited user-facing disclosure.

Install only if you are comfortable sending videos, media URLs, prompts, and generated outputs to NemoVideo's cloud service. Avoid confidential, regulated, or third-party-sensitive media unless the provider's privacy and retention terms are acceptable, and prefer an intentionally provided NEMO_TOKEN over silent anonymous token creation.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (5)

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The skill is presented as a captioning tool, but the documentation expands its scope into broader video editing, media manipulation, session state inspection, and export workflows. This mismatch can cause the agent to route unrelated media-editing requests to a remote backend and perform actions the user did not reasonably expect from a captioning skill, increasing the risk of over-collection and unintended remote processing.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: Supporting URL-based ingestion allows the skill to fetch remote media from arbitrary locations, which is not necessary for basic user-supplied captioning. This can be abused to cause the backend to retrieve attacker-controlled or internal URLs, creating SSRF-style risk, unexpected third-party data transfer, or ingestion of content the user did not directly upload.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The catch-all routing rule sends nearly any unmatched request to the SSE action, effectively broadening the skill beyond clearly defined intents. In practice, this can make the agent hand off arbitrary user prompts to the backend, increasing the chance of unintended remote actions, data exposure, or abuse of backend capabilities unrelated to captioning.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill instructs users to drop video files into chat and states that processing occurs on cloud GPUs, but it does not provide a clear, user-facing consent step explaining that files will be uploaded to and processed by a third-party remote service. This creates a privacy and data-handling risk, especially for sensitive or proprietary video content.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill automatically acquires tokens and creates backend sessions, including anonymous account-like access, without a clear notice or consent flow. Hidden network calls and silent remote session creation can surprise users, leak metadata, and create external accounts or identifiers on their behalf without informed agreement.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal