hidream-model-gen

Security checks across malware telemetry and agentic risk

Overview

This skill broadly matches its Vivago media-generation purpose, but it ships sensitive person-image templates without enough user-facing warnings or guardrails.

Review the template catalog before installing. Use this only if you are comfortable sending selected prompts and images to Vivago, protect the API token with the HIDREAM_AUTHORIZATION environment variable instead of CLI arguments, and avoid using sensitive personal photos unless there is explicit consent and a clear reason. Sensitive templates involving adult content, deceased people, endorsements, or protected-attribute defaults should be removed, gated, or manually approved before general agent use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (16)

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The skill metadata says it supports text-to-image, image-to-image, image-to-video, and keyframe-to-video, but this file also exposes a full text_to_video capability. That mismatch creates hidden functionality: downstream orchestration, policy gating, or user-consent logic may not apply the right safeguards because the manifest understates what the skill can actually do.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The manifest presents the skill as a general Vivago image/video generator, but the configuration contains a very large template_to_video catalog with many specialized transformations. Undisclosed feature surface is risky because policy reviewers and calling agents may treat the skill as narrower than it is, while the templates enable behavior with very different safety, privacy, and reputational implications.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: This section includes sensitive relationship and deceased-person transformations such as 'With Deceased' and related novelty effects that go beyond generic media generation. In this skill context, that is more dangerous because the tool operates on user-supplied personal images, so hidden access to grief-related or identity-transforming templates can enable manipulative, privacy-invasive, or emotionally harmful outputs without clear disclosure or consent controls.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: This template embeds explicit sexualized motion and an `audio_prompt` of 'asmr, female moaning', expanding the skill from image/video generation into sexualized audio synthesis without clear disclosure or scope controls. In an agent skill, that creates policy and safety risk because benign user photo requests could be routed into sexually explicit output modes unexpectedly.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: This second template similarly includes sexualized body-motion language and an `audio_prompt` of 'asmr, male moaning', indicating hidden adult-content generation capability outside the stated skill purpose. Such latent capability increases the chance of surprise NSFW output, misuse with user-supplied images, and policy violations in otherwise general media-generation workflows.

Description-Behavior Mismatch

Medium

Confidence: 84% confidence
Finding: The template set includes commercial endorsement/product-promotion generation that goes beyond the declared scope of general image/video/style-transfer creation. In an agent setting, undeclared advertising functionality can be abused to generate deceptive endorsements or promotional content using a user's likeness without clear consent boundaries.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: This skill processes local images and sends them to a third-party AI service, then stores generated outputs locally and may share media URLs to external platforms such as Feishu. Without a prominent upfront warning, users may provide sensitive or private images without realizing they will leave the local environment and persist in external and local storage.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The presence of a deceased-person transformation without any adjacent warning or safeguard text is a real safety issue. These features can produce emotionally charged or privacy-sensitive content from personal photos, and absent warnings make it easier for an agent or user to invoke them without appreciating the risk.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: This is another deceased-person-related template exposed without a warning, showing the issue is systemic rather than isolated. Repeated omission of sensitivity labeling increases the chance that higher-level tooling will surface these options as ordinary effects, which is inappropriate for grief-related or memorial-style content.

Natural-Language Policy Violations

Medium

Confidence: 78% confidence
Finding: Several templates such as 'With Deceased', 'Hold Deceased', and similar relationship-oriented effects suggest memorial or intimate synthetic imagery involving deceased persons without any stated policy guardrails, consent checks, or contextual restrictions. In a generative media skill, this increases the risk of harmful non-consensual deepfake-style memorial content, emotional abuse, harassment, and reputational harm, especially because the skill is explicitly designed to transform user-supplied images.

Vague Triggers

Medium

Confidence: 80% confidence
Finding: Broad template names and unconstrained prompts make safe routing difficult because the agent may select an inappropriate effect for an unrelated request. In this file, vague triggers coexist with highly sensitive transformations, increasing the risk of accidental violent, sexual, or otherwise unsafe output selection.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: Single-word or trigger-like template labels provide almost no semantic constraint, which is unsafe when the template library includes many risky or surprising transformations. This ambiguity can cause the agent to misapply an effect to normal user input, undermining predictability and consent.

Vague Triggers

Low

Confidence: 76% confidence
Finding: A generic effect name used as the sole prompt is weakly constrained and can be misrouted, though by itself it is less severe than explicit harmful content. The risk is mostly selection ambiguity: the system may invoke an effect the user did not request because there is no meaningful scope definition.

Natural-Language Policy Violations

High

Confidence: 96% confidence
Finding: This template forces a race-specific subject description ('beautiful white woman') without any user choice or documented justification. That creates discriminatory output behavior and can overwrite user identity or inject protected-attribute constraints into generated content, which is especially risky in an identity-preserving transformation skill.

Natural-Language Policy Violations

High

Confidence: 96% confidence
Finding: This product-display template also hardcodes race for the generated model, introducing unjustified protected-attribute targeting into content generation. In a skill that can use uploaded likenesses, this may distort identity, produce exclusionary outputs, or create discriminatory model-selection behavior at scale.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script allows an API token to be passed directly via the --token command-line argument, which can expose the credential through shell history, process listings, job logs, or system monitoring tools. In an agent or multi-user environment, this increases the chance of accidental credential disclosure even if the code does not explicitly log the token.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal