Ai Caption Generator Free

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed cloud video-captioning and editing integration, with privacy and scope caveats but no evidence of malicious behavior.

Install only if you are comfortable sending videos, media URLs, prompts, and generated project state to NemoVideo’s cloud service. Keep NEMO_TOKEN private, avoid confidential media unless the service is appropriate for it, and use the skill for intentional video captioning or editing tasks rather than generic media requests.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (6)

Description-Behavior Mismatch

Medium
Confidence
90% confidence
Finding
The skill is presented as a simple caption generator, but the documented API surface and workflows support broad video editing, rendering, media manipulation, and stateful project operations. This scope mismatch can mislead users and calling agents into authorizing capabilities beyond what they reasonably expect, increasing the chance of unintended remote processing or content modification.

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
The upload flow permits URL-based ingestion of remote media even though the skill is marketed around user-supplied video captioning. This expands the trust boundary and could cause the system to fetch third-party or sensitive internal URLs if upstream components ever pass untrusted links, creating data exposure or SSRF-like risk in the backend service.

Vague Triggers

Medium
Confidence
82% confidence
Finding
The invocation description is broad enough that ordinary editing or media-related requests could activate the skill outside a clearly understood captioning context. Over-broad triggering increases the risk of accidental routing of user content to this remote service without sufficiently specific intent or informed consent.

Vague Triggers

Medium
Confidence
91% confidence
Finding
The catch-all rule routes 'everything else' to SSE processing, allowing the skill to absorb a wide range of unrelated prompts. In context, that means ambiguous user requests may be sent to a remote backend with broad editing semantics, which is dangerous because the user may not have intended to invoke this service at all.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill encourages users to drop video files into chat but does not clearly warn that those files and related instructions are sent to a third-party backend for processing. This lack of disclosure undermines informed consent and can expose sensitive media, metadata, or embedded personal information to an external service unexpectedly.

Natural-Language Policy Violations

Medium
Confidence
76% confidence
Finding
The session creation flow hard-codes the language to English without asking the user, even though the skill advertises multilingual captioning. This can cause user instructions or generated captions to be processed under the wrong language context, leading to inaccurate output and inadvertent mishandling of multilingual or non-English content.

VirusTotal

60/60 vendors flagged this skill as clean.

View on VirusTotal