Caption Generator Bangla

Security checks across malware telemetry and agentic risk

Overview

This skill is a cloud Bangla video-captioning workflow that sends selected media to NemoVideo for processing, with some broader editing features users should understand before use.

Install only if you are comfortable sending selected videos or provided media URLs to NemoVideo for remote processing under a NEMO_TOKEN or generated anonymous token. Avoid confidential or private videos unless you trust that service's data handling, and use clear video-captioning or editing prompts so the broad SSE route is used only for intended media work.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (5)

Description-Behavior Mismatch

Medium
Confidence
91% confidence
Finding
The skill is presented as a narrow Bangla caption generator, but the body defines a much broader remote video-editing and rendering control surface, including session management, SSE editing, state inspection, export, and general media manipulation. This mismatch can mislead users and host agents into granting broader capabilities than expected, increasing the risk of unintended data transfer or misuse of remote editing features.

Context-Inappropriate Capability

Low
Confidence
80% confidence
Finding
The skill exposes credits, session state, and account-related flows that are not necessary for a simple captioning task. While not directly exploitative on their own, these capabilities expand the accessible API surface and can reveal billing/session information or encourage actions outside the user's expected task.

Context-Inappropriate Capability

Low
Confidence
89% confidence
Finding
Allowing upload by arbitrary URL broadens the input model from user-provided local media to remotely fetched content, which can enable unexpected third-party data ingestion and conceal the true origin of processed media. This increases the risk of abuse, privacy issues, and unintended access patterns beyond the declared upload workflow.

Vague Triggers

Medium
Confidence
86% confidence
Finding
The catch-all routing rule sends 'everything else' into the SSE editing path, making activation far broader than a narrowly scoped captioning skill should permit. Overbroad triggers can cause the agent to process unrelated requests through a powerful remote backend, leading to scope creep and accidental data handling.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill states that rendering happens server-side, but it does not provide a clear upfront warning that uploaded videos are transmitted to a third-party remote processing service. For user media, this omission is significant because videos may contain sensitive personal, biometric, or proprietary content, and informed consent depends on transparent disclosure.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal