Easy Auto Caption

Security checks across malware telemetry and agentic risk

Overview

This is a cloud video-captioning skill that sends media and prompts to NemoVideo for processing, with the main privacy risks disclosed enough to keep it out of Review.

Install only if you are comfortable sending videos, audio, images, URLs, prompts, and session metadata to NemoVideo’s cloud service. Avoid confidential recordings, private/internal URLs, sensitive business media, or copyrighted content unless you have permission and accept remote processing.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (5)

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill is presented as a simple auto-caption tool, but the documented behavior enables a much broader remote video-editing and rendering workflow. This scope expansion can mislead users and host agents about what data and operations are actually being sent to a third-party backend, increasing the chance of unintended remote processing and policy bypass.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The documented upload path accepts many non-video file types and remote URL ingestion even though the skill is marketed around local video subtitle generation. This creates hidden input channels that could cause unexpected third-party fetching, broader data exfiltration, or handling of content outside the user's reasonable expectations.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The catch-all routing rule sends nearly any non-explicit request into the SSE editing path, which can cause the skill to activate on vague or unrelated prompts. Over-broad invocation increases the risk of accidental data transmission to the backend and unintended execution of remote editing actions based on ambiguous user intent.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill instructs the agent to connect to the backend automatically on first open, but does not require a clear upfront notice that network requests and authentication will occur. Silent initialization can expose metadata or create remote sessions before the user has meaningfully consented to cloud processing.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The description says the tool handles subtitles on cloud GPUs, but it does not prominently warn that uploaded videos are transferred to and processed by a remote service. For a media tool handling potentially sensitive recordings, this missing disclosure materially increases privacy and confidentiality risk because users may assume local-only processing.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal