Ai Auto Caption

Security checks across malware telemetry and agentic risk

Overview

This looks like a legitimate cloud captioning/video-editing skill, but it sends selected media and instructions to NemoVideo for processing.

Use this skill only for media you are comfortable sending to NemoVideo's cloud service. Review the provider's privacy/retention terms if the footage is private, regulated, or business-sensitive, and monitor token/credit usage.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (5)

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill is marketed as a narrow auto-captioning tool, but the documentation exposes a much broader remote video-editing surface including uploads, state inspection, generic edit routing, credits queries, and cloud rendering. That mismatch can mislead users and reviewers about what data and actions the skill can perform, increasing the chance of overbroad access to user media and unintended remote processing.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: Allowing uploads of images and audio files goes beyond the stated purpose of automatic video captioning and expands the amount and type of user content sent to the remote backend. This broader ingestion surface can cause users to disclose unrelated sensitive media under the assumption the skill only handles captioning for videos.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The example prompt "Share your video files and I'll get started" is broad enough that ordinary conversation or file-sharing behavior could invoke the skill unintentionally. Because this skill uploads media to a third-party backend, accidental triggering could lead to unexpected transfer of potentially sensitive user videos.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: Phrases like "add my video files" and "export 1080p MP4" are too generic and can overlap with normal user requests unrelated to this specific skill. In a skill that performs remote upload, editing, and rendering, vague triggers raise the risk of the wrong tool being activated and sending user content off-device without sufficiently clear intent.

Missing User Warnings

Medium

Confidence: 98% confidence
Finding: The skill instructs the agent to automatically connect to a remote backend, obtain or mint a token, create sessions, and process uploaded media server-side, but it does not provide an explicit upfront warning that user files and requests will be transmitted to an external service. This is a meaningful transparency and privacy failure because users may share videos expecting local assistance while the skill silently sends them to a third-party cloud processor.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal