hidream-model-gen

Other

Generate images and videos using Vivago AI (智小象) platform. Supports text-to-image, image-to-image, image-to-video, and keyframe-to-video generation. Use when the user wants to create AI-generated images or videos, transform existing images, or perform image style transfer through the Vivago AI API.

Install

openclaw skills install hidream-model-gen

Vivago AI Skill

Integration with Vivago AI (智小象) platform for AI-powered image and video generation.

Supported Features

Image Generation

Text to Image (txt2img): Generate images from text descriptions
Image to Image (img2img): Transform existing images based on prompts, including style transfer, image editing, and multi-image fusion

Video Generation

Text to Video (txt2vid): Generate videos from text descriptions
Image to Video (img2vid): Generate videos from static images
Keyframe to Video (keyframe_to_video): Generate transition videos from start and end keyframes
Video Templates (template_to_video): 181 pre-defined video effects
Supports multiple model versions (v3Pro, v3L, kling-video-o1)

Additional Features

Image upload to Vivago storage
Batch generation (up to 4 images)
Multiple aspect ratios (1:1, 4:3, 3:4, 16:9, 9:16)
Automatic retry with polling

Architecture

Core Modules

scripts/
├── vivago_client.py       # Main API client
├── template_manager.py    # Template management
├── config_loader.py       # Configuration loading
├── enums.py              # Type enums (TaskStatus, AspectRatio, etc.)
├── exceptions.py         # Structured exceptions
└── config/               # Modular configuration files

Code Quality

Type Safety: Complete type annotations and enums
Exception Handling: Structured exception hierarchy
CI/CD: GitHub Actions for automated testing
Modular Config: Split configuration files for maintainability

Setup

Prerequisites

Before using this skill, you need to obtain a Vivago.ai API Token:

Step 1: Login to Vivago.ai

Visit https://vivago.ai/ and log in to your account
Check your remaining credits and consider subscribing to a suitable plan if needed

Step 2: Obtain Your Token

After logging in, visit https://vivago.ai/prod-api/user/token
The page will return your API Token (in JWT format)
Copy this Token for configuration

Security Note: The Token is your credential for accessing the API. Please keep it secure and do not share it with others.

Environment Variables

Security Note: For secure deployments and AI Agents, the system requires the token to be passed strictly via the HIDREAM_AUTHORIZATION environment variable.

Export it securely in your current session:

export HIDREAM_AUTHORIZATION="your_vivago_api_token"

Note: STORAGE_AK and STORAGE_SK are deprecated and removed. The image upload uses secure pre-signed URLs provided by the Vivago API.

File Output Configuration

Important: By default, all generated resources (JSON results, downloaded images, and videos) will be output to the assets/ directory within the current working folder. Ensure this directory exists or the system has permission to create it.

Installation

pip install -r requirements.txt

Usage

Python API

from scripts import create_client, VivagoClient
from scripts.enums import AspectRatio, PortName, TaskStatus
from scripts.exceptions import TaskFailedError, TaskTimeoutError

# Create client
client = create_client()

# Text to image
results = client.text_to_image(
    prompt="a beautiful sunset over mountains",
    port=PortName.KLING_IMAGE,  # or PortName.NANO_BANANA
    wh_ratio=AspectRatio.RATIO_16_9,
    batch_size=2
)

# Image to video (using local image)
results = client.image_to_video(
    prompt="camera slowly zooming out",
    image_uuid=client.upload_image("/path/to/image.jpg"),
    port=PortName.V3PRO,
    wh_ratio=AspectRatio.RATIO_16_9,
    duration=5
)

# Keyframe to video (using start and end images)
results = client.keyframe_to_video(
    prompt="smooth transition from start to end",
    start_image_uuid=client.upload_image("/path/to/start.jpg"),
    end_image_uuid=client.upload_image("/path/to/end.jpg"),
    port=PortName.V3PRO,
    wh_ratio=AspectRatio.RATIO_16_9,
    duration=5
)

# Video Templates - use pre-defined effects
results = client.template_to_video(
    image_uuid=client.upload_image("/path/to/image.jpg"),
    template="ghibli",  # See available templates below
    wh_ratio=AspectRatio.RATIO_9_16
)

Error Handling

from scripts.exceptions import (
    TaskFailedError,
    TaskRejectedError,
    TaskTimeoutError,
    InvalidPortError
)

try:
    results = client.image_to_video(...)
except TaskFailedError as e:
    print(f"Task failed: {e.task_id}")
except TaskRejectedError as e:
    print(f"Content rejected: {e.reason}")
except TaskTimeoutError as e:
    print(f"Timeout after {e.timeout_seconds}s")
except InvalidPortError as e:
    print(f"Invalid port: {e.port}, available: {e.available}")

Command Line (Best for AI Agents)

For AI Agents: The easiest way to use this skill is through the provided CLI scripts. They automatically handle API communication, polling, and result parsing. By default, they use HiDream's native models.

Text to Image:

python3 scripts/txt2img.py \
  --prompt "a futuristic city" \
  --wh-ratio 16:9 \
  --batch-size 2 \
  --output ./assets/results.json

Note: This defaults to the hidream-txt2img model.

Text to Video:

python3 scripts/txt2vid.py \
  --prompt "a cybernetic dragon flying over a futuristic city" \
  --wh-ratio 16:9 \
  --duration 5 \
  --output ./assets/video_results.json

Note: This defaults to the v3Pro model.

Image to Video:

python3 scripts/img2video.py \
  --prompt "slow motion falling leaves" \
  --image ./assets/source_image.jpg \
  --duration 5 \
  --output ./assets/video.json

API Reference

Enums

from scripts.enums import (
    TaskStatus,      # PENDING, COMPLETED, PROCESSING, FAILED, REJECTED
    AspectRatio,     # RATIO_1_1, RATIO_4_3, RATIO_16_9, etc.
    PortCategory,    # TEXT_TO_IMAGE, IMAGE_TO_VIDEO, etc.
    PortName         # KLING_IMAGE, V3PRO, NANO_BANANA, etc.
)

Models

Feature	Available Versions	Default
Text to Image	v3L (HiDream), kling-image-o1	v3L (via port `hidream-txt2img`)
Image to Video	v3Pro, v3L, kling-video-o1	v3Pro
Keyframe to Video	v3Pro, v3L	v3Pro

Note for AI Agents: By default, all CLI tools (txt2img.py, txt2vid.py) are pre-configured to use HiDream's native models (hidream-txt2img for images, v3Pro for videos). You don't need to specify the model unless explicitly requested by the user.

Aspect Ratios

1:1 - Square
4:3 - Standard
3:4 - Portrait
16:9 - Widescreen
9:16 - Mobile/Vertical

Task Status Codes

from scripts.enums import TaskStatus

TaskStatus.PENDING     # 0 - Pending
TaskStatus.COMPLETED   # 1 - Completed
TaskStatus.PROCESSING  # 2 - Processing
TaskStatus.FAILED      # 3 - Failed
TaskStatus.REJECTED    # 4 - Rejected (content review)

File Structure

vivago-ai-skill/
├── scripts/
│   ├── __init__.py         # Package exports
│   ├── vivago_client.py    # Core API client
│   ├── template_manager.py # Template management
│   ├── config_loader.py    # Configuration loader
│   ├── enums.py            # Type enums
│   ├── exceptions.py       # Exception classes
│   ├── logging_config.py   # Logging configuration
│   └── config/             # Modular config files
│       ├── base.json
│       ├── text_to_image.json
│       ├── image_to_video.json
│       └── ...
├── tests/
│   ├── conftest.py         # Pytest configuration
│   ├── archive/            # Archived tests
│   └── ...
├── docs/                   # Documentation
├── .github/workflows/      # CI configuration
├── requirements.txt
├── README.md
└── SKILL.md               # This file

Important Notes

Feishu Channel Messaging Guidelines

When sending generated content through Feishu (飞书) channel:

Content Type	Send Method	Example
Images	✅ Direct file upload	Attach image file directly
Videos	❌ Must send as link	`https://media.vivago.ai/{video_uuid}`

⚠️ Critical: Videos CANNOT be sent as file attachments in Feishu. Always construct and send the direct media URL:

https://media.vivago.ai/b1268f08-ac32-4b83-863f-a419797d768e.mp4

Why: Feishu does not support playable video attachments. Sending video files directly will result in delivery failure or unplayable content.

Image Download

Images can be downloaded using the correct URL format:

https://storage.vivago.ai/image/{image_name}.jpg

Example:

from scripts import create_client
import requests

client = create_client()

# Generate image
results = client.text_to_image(prompt="a cute cat")
image_name = results[0].get('image', '')

# Download image
image_url = f"https://storage.vivago.ai/image/{image_name}.jpg"
response = requests.get(image_url)
with open("output.jpg", "wb") as f:
    f.write(response.content)

Sending via Feishu:

# Download and send through Feishu
image_data = requests.get(image_url).content
# Then send image_data as file attachment via Feishu API

Asynchronous Processing

API calls are asynchronous with automatic polling
Images are automatically resized to max 1024px on longest side before upload
Video generation supports 5 or 10 second durations
Batch size for images: 1-4, for videos: 1
All API calls include automatic retry logic

Error Handling

The client handles common errors:

Network timeouts (with retry)
Rate limiting (with exponential backoff)
Invalid parameters (validation before API call)
Task failures (structured exceptions)

Exception Hierarchy

VivagoError (base)
├── VivagoAPIError
├── MissingCredentialError
├── InvalidPortError
├── ImageUploadError
├── TemplateNotFoundError
└── TaskError
    ├── TaskFailedError
    ├── TaskRejectedError
    └── TaskTimeoutError

Video Templates Reference

The following 181 video templates are available via template_to_video():

Quick Categories

Category	Count	Example Templates
Style Transfer	20+	ghibli, 1930s-2000s vintage styles
Harry Potter	4	magic_reveal_ravenclaw, gryffindor, hufflepuff, slytherin
Wings/Fantasy	10+	angel_wings, phoenix_wings, crystal_wings, fire_wings
Superheroes	5+	iron_man, cat_woman, ghost_rider
Dance	10+	apt, dadada, dance, limbo_dance
Effects	15+	ash_out, metallic_liquid, flash_flood
Thanksgiving	10+	turkey_chasing, autumn_feast, gratitude_photo
Comics/Cartoon	8+	gta_star, anime_figure, bring_comics_to_life
Products	8+	glasses_display, music_box, food_product_display
Scenes	20+	romantic_kiss, graduation, starship_chef

Popular Templates

Template ID	Description
`ghibli` / `ghibli2`	Studio Ghibli animation style
`magic_reveal_ravenclaw`	Harry Potter Ravenclaw transformation
`magic_reveal_gryffindor`	Harry Potter Gryffindor transformation
`magic_reveal_hufflepuff`	Harry Potter Hufflepuff transformation
`magic_reveal_slytherin`	Harry Potter Slytherin transformation
`iron_man`	Iron Man armor assembly
`angel_wings` / `phoenix_wings` / `crystal_wings` / `fire_wings`	Wing transformations
`cat_woman`	Cat Woman style
`ghost_rider`	Ghost Rider flaming skull
`joker`	Joker villain style
`mermaid`	Mermaid underwater scene
`snow_white`	Snow White princess
`barbie`	Barbie princess transformation
`me_in_hand`	Miniature figure in hand
`music_box`	Rotating figure on music box
`anime_figure`	Transform into anime figure
`gta_star`	GTA game style transformation
`apt` / `dadada` / `dance`	Dance templates
`ash_out`	Disintegrate into ashes
`eye_of_the_storm`	Thunder god awakening
`metallic_liquid`	Metal mask transformation
`flash_flood`	Water/flood effect
`turkey_chasing` / `turkey_away` / `turkey_giant`	Thanksgiving turkey scenes
`autumn_feast` / `autumn_stroll`	Autumn scenes
`renovation_of_old_photos`	Colorize B&W photos
`graduation`	Graduation ceremony
`glasses` / `glasses_display`	Glasses/eyewear showcase
`bikini` / `sexy_man` / `sexy_pants`	Fashion/beach
`romantic_kiss` / `boyfriends_rose` / `girlfriends_rose`	Romantic scenes
`ai_archaeologist` / `starship_chef` / `cyber_cooker`	Sci-fi characters
`jungle_reign` / `panther_queen` / `roar_of_the_dustlands` / `tiger_snuggle`	Animal companions
`instant_sadness` / `headphone_vibe` / `relax`	Emotion/reaction
`frost_alert`	Cold/freeze effect
`bald_me`	Bald transformation
`boom_hair` / `curl_pop` / `long_hair`	Hair transformations
`muscles`	Muscle transformation
`face_punch` / `gun_point`	Action effects
`static_shot` / `tracking_shot` / `orbit_shot` / `push_in` / `zoom_out` / `handheld_shot`	Camera movements
`earth_zoom_in` / `earth_zoom_out`	Earth zoom effects

View All Templates

from scripts.template_manager import get_template_manager

manager = get_template_manager()
templates = manager.list_templates()

print(f"Total templates: {len(templates)}")
for tid, name in sorted(templates.items()):
    print(f"  {tid}: {name}")

Usage Example

from scripts import create_client

client = create_client()

# Upload image
image_uuid = client.upload_image("/path/to/photo.jpg")

# Apply Ghibli style template
results = client.template_to_video(
    image_uuid=image_uuid,
    template="ghibli",
    wh_ratio="9:16"
)

# Harry Potter transformation
results = client.template_to_video(
    image_uuid=image_uuid,
    template="magic_reveal_ravenclaw",
    wh_ratio="9:16"
)

Changelog

v0.9.0 (2026-03-09)

✅ Code review complete (P0-P3)
✅ Added GitHub Actions CI
✅ Added type safety module (enums.py)
✅ Added structured exceptions (exceptions.py)
✅ Split configuration into modular files
✅ Archived redundant code and tests
✅ Pinned dependency versions

v0.8.2 (2026-03-08)

✅ Template testing: 44 templates, 40 passed (90.9%)
✅ Fixed metallic_liquid naming issue
✅ Marked long_hair as deprecated

v0.8.0 (2026-03-07)

✅ Completed Tier 1-4 testing
✅ Established smart test optimization system