Caption Generator Bangla

Security checks across malware telemetry and agentic risk

Overview

This skill is a cloud Bangla video-captioning workflow that sends selected media to NemoVideo for processing, with some broader editing features users should understand before use.

Install only if you are comfortable sending selected videos or provided media URLs to NemoVideo for remote processing under a NEMO_TOKEN or generated anonymous token. Avoid confidential or private videos unless you trust that service's data handling, and use clear video-captioning or editing prompts so the broad SSE route is used only for intended media work.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (5)

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The skill is presented as a narrow Bangla caption generator, but the body defines a much broader remote video-editing and rendering control surface, including session management, SSE editing, state inspection, export, and general media manipulation. This mismatch can mislead users and host agents into granting broader capabilities than expected, increasing the risk of unintended data transfer or misuse of remote editing features.

Context-Inappropriate Capability

Low

Confidence: 80% confidence
Finding: The skill exposes credits, session state, and account-related flows that are not necessary for a simple captioning task. While not directly exploitative on their own, these capabilities expand the accessible API surface and can reveal billing/session information or encourage actions outside the user's expected task.

Context-Inappropriate Capability

Low

Confidence: 89% confidence
Finding: Allowing upload by arbitrary URL broadens the input model from user-provided local media to remotely fetched content, which can enable unexpected third-party data ingestion and conceal the true origin of processed media. This increases the risk of abuse, privacy issues, and unintended access patterns beyond the declared upload workflow.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: The catch-all routing rule sends 'everything else' into the SSE editing path, making activation far broader than a narrowly scoped captioning skill should permit. Overbroad triggers can cause the agent to process unrelated requests through a powerful remote backend, leading to scope creep and accidental data handling.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill states that rendering happens server-side, but it does not provide a clear upfront warning that uploaded videos are transmitted to a third-party remote processing service. For user media, this omission is significant because videos may contain sensitive personal, biometric, or proprietary content, and informed consent depends on transparent disclosure.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal