Text To Video Making

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only connector to a cloud text-to-video service, with privacy and scope caveats but no evidence of malicious behavior.

Install only if you are comfortable sending prompts and uploaded files to NemoVideo's cloud service. Avoid confidential documents or private media unless you trust that service, and note that the operational instructions support media uploads beyond the text formats highlighted in the description.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (6)

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The skill description and examples frame the capability as text-to-video conversion from TXT/DOCX/PDF/SRT, but the implementation instructions allow uploading and processing a much broader set of media types including video, audio, and images. This mismatch can cause users or calling platforms to grant the skill broader access than expected, increasing the risk of unintended data exfiltration or misuse of local media files.

Vague Triggers

Medium
Confidence
91% confidence
Finding
The invocation language is very broad and includes generic phrases like sending written prompts or describing what the user wants. This can cause the skill to activate on ordinary conversation and route user content to a remote backend unexpectedly, creating privacy and consent issues.

Vague Triggers

Medium
Confidence
93% confidence
Finding
The catch-all rule routes 'Everything else' to the SSE generation path, which is an overly permissive default for a networked skill. Ambiguous or unrelated requests may be sent to the backend, leading to unintended processing of user data and surprising behavior.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The skill instructs the agent to use an environment token automatically or obtain an anonymous token and create a backend session before handling user requests, while explicitly hiding technical details from the user. This creates silent authentication and third-party data transfer without informed consent, and it encourages use of sensitive credentials without clear disclosure or scoped authorization warnings.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The skill states that uploaded content is sent through a cloud rendering pipeline and downloadable results are returned, but it does not clearly warn users that their files and prompts leave the local environment for remote processing. Because uploads may include documents and media, the privacy impact is significant, especially for sensitive or proprietary material.

Natural-Language Policy Violations

Medium
Confidence
83% confidence
Finding
The session creation hard-codes `"language":"en"`, which can override user preference and send content under an inaccurate language setting. This is primarily a quality, consent, and transparency issue rather than a direct security flaw, but it can still lead to mishandling of user input and unexpected backend behavior.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal