Text Video Generator

Security checks across malware telemetry and agentic risk

Overview

This skill is not clearly malicious, but it needs review because it can automatically create a cloud session and send broad prompts or uploaded files to NemoVideo with limited runtime disclosure.

Install only if you are comfortable sending prompts, documents, media files, session metadata, and platform attribution to NemoVideo's cloud backend. Avoid confidential content unless the provider's retention and deletion practices are acceptable, and require explicit confirmation before token creation, uploads, or render requests.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (5)

Description-Behavior Mismatch

Medium
Confidence
87% confidence
Finding
The manifest markets the skill as text-to-video for TXT/DOCX/PDF/copied text, but the body documents a much broader media upload and editing pipeline that accepts video, image, and audio assets. This mismatch can mislead users and reviewers about the actual data flows and permissions, increasing the chance that users upload sensitive non-text media without realizing the broader cloud-processing scope.

Context-Inappropriate Capability

Low
Confidence
81% confidence
Finding
Requiring `X-Skill-Platform` derived from install path performs unnecessary platform fingerprinting unrelated to core text-to-video generation. Even if low sensitivity on its own, it creates extra metadata leakage about the user's environment and can support tracking, segmentation, or backend policy differences without clear user benefit.

Vague Triggers

Medium
Confidence
78% confidence
Finding
The invocation text is broad enough that ordinary conversational requests about generating videos or describing desired output could activate the skill unexpectedly. Over-broad triggering is dangerous because it can route user content into this skill's remote backend and session workflow without sufficiently explicit user intent.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The catch-all rule routes 'Everything else' to SSE, making the default trigger scope ambiguous and overly permissive. In practice, this can cause unrelated prompts to be sent to the backend, increasing the risk of unintended data disclosure and confusing skill activation boundaries.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The skill encourages users to drop prompts and files into chat and says it will handle cloud GPU processing, but it does not provide a clear, front-loaded warning that uploaded content is sent to a third-party cloud backend. This omission undermines informed consent and may expose sensitive text or documents to remote services unexpectedly.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal