Ai Caption Generator Free

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed cloud video-captioning and editing integration, with privacy and scope caveats but no evidence of malicious behavior.

Install only if you are comfortable sending videos, media URLs, prompts, and generated project state to NemoVideo’s cloud service. Keep NEMO_TOKEN private, avoid confidential media unless the service is appropriate for it, and use the skill for intentional video captioning or editing tasks rather than generic media requests.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The skill is presented as a simple caption generator, but the documented API surface and workflows support broad video editing, rendering, media manipulation, and stateful project operations. This scope mismatch can mislead users and calling agents into authorizing capabilities beyond what they reasonably expect, increasing the chance of unintended remote processing or content modification.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The upload flow permits URL-based ingestion of remote media even though the skill is marketed around user-supplied video captioning. This expands the trust boundary and could cause the system to fetch third-party or sensitive internal URLs if upstream components ever pass untrusted links, creating data exposure or SSRF-like risk in the backend service.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The invocation description is broad enough that ordinary editing or media-related requests could activate the skill outside a clearly understood captioning context. Over-broad triggering increases the risk of accidental routing of user content to this remote service without sufficiently specific intent or informed consent.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The catch-all rule routes 'everything else' to SSE processing, allowing the skill to absorb a wide range of unrelated prompts. In context, that means ambiguous user requests may be sent to a remote backend with broad editing semantics, which is dangerous because the user may not have intended to invoke this service at all.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill encourages users to drop video files into chat but does not clearly warn that those files and related instructions are sent to a third-party backend for processing. This lack of disclosure undermines informed consent and can expose sensitive media, metadata, or embedded personal information to an external service unexpectedly.

Natural-Language Policy Violations

Medium

Confidence: 76% confidence
Finding: The session creation flow hard-codes the language to English without asking the user, even though the skill advertises multilingual captioning. This can cause user instructions or generated captions to be processed under the wrong language context, leading to inaccurate output and inadvertent mishandling of multilingual or non-English content.

VirusTotal

60/60 vendors flagged this skill as clean.

View on VirusTotal