ComfyUI Image & Video Generation

Other

Generate images and videos via ComfyUI on local GPU. Supports Flux text-to-image, Wan2.1 text-to-video, and image-to-video.

Install

openclaw skills install comfyui-image-video

ComfyUI — Image & Video Generation

Use to generate images (Flux schnell) and videos (Wan2.1 T2V/I2V) on the local RTX 5080 GPU.

Environment

ComfyUI: ~/ComfyUI (systemd user service: comfyui.service)
Python venv: ~/comfyui-venv
API: http://127.0.0.1:8188
Output: ~/ComfyUI/output/

Script

{baseDir}/scripts/generate.py <mode> [options]

Mode: `image` — Text-to-Image (Flux schnell)

{baseDir}/scripts/generate.py image \
  --prompt "A cat on the moon" \
  --output /tmp/output.png

Option	Default	Description
`--prompt`	(required)	Text prompt
`--negative`	""	Negative prompt
`--width`	1024	Image width
`--height`	1024	Image height
`--steps`	4	Sampling steps (schnell optimized)
`--seed`	random	Reproducible seed
`--output`	ComfyUI output dir	Copy output here
`--model`	flux1-schnell.safetensors	UNET filename
`--weight-dtype`	fp8_e4m3fn	Weight quantization
`--wait`	120	Max wait seconds

Recommended Flux schnell params: steps=4, cfg=1.0, sampler=euler, scheduler=simple

Mode: `t2v` — Text-to-Video (Wan2.1 T2V-1.3B)

{baseDir}/scripts/generate.py t2v \
  --prompt "A red sports car driving on a mountain road at sunset" \
  --length 49 \
  --output /tmp/video_frames/

Option	Default	Description
`--prompt`	(required)	Text prompt
`--negative`	""	Negative prompt
`--width`	832	Frame width
`--height`	480	Frame height
`--length`	49	Number of frames (≈3s at 16fps)
`--steps`	20	Sampling steps
`--seed`	random	Reproducible seed
`--output`	ComfyUI output dir	Copy frames here
`--wait`	300	Max wait seconds

Recommended Wan2.1 T2V params: steps=20, cfg=5.0, sampler=uni_pc_bh2, scheduler=simple

Mode: `i2v` — Image-to-Video (Wan2.1 I2V using T2V-1.3B)

{baseDir}/scripts/generate.py i2v \
  --prompt "gentle wave motion, water flowing" \
  --image /path/to/input.png \
  --output /tmp/video_frames/

Option	Default	Description
`--prompt`	(required)	Motion description
`--image`	(required)	Path to input image
`--length`	49	Number of frames
`--steps`	20	Sampling steps
`--seed`	random	Reproducible seed
`--output`	ComfyUI output dir	Copy frames here
`--wait`	300	Max wait seconds

Server Management

# Start (systemd user service)
systemctl --user start comfyui.service

# Check status
systemctl --user status comfyui.service

# Check API
curl -s http://127.0.0.1:8188/system_stats | python3 -m json.tool

# Manual start (if systemd not available)
cd ~/ComfyUI && LD_LIBRARY_PATH=~/comfyui-venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:$LD_LIBRARY_PATH ~/comfyui-venv/bin/python main.py --listen 127.0.0.1 --port 8188

Installed Models

Image (Flux)

File	Location	Size
flux1-schnell.safetensors	models/unet/	23.8GB
ae.safetensors	models/vae/	335MB
clip_l.safetensors	models/clip/	250MB
t5xxl_fp16.safetensors	models/clip/	9.8GB

Video (Wan2.1)

File	Location	Size
wan2.1_t2v_1.3B_bf16.safetensors	models/diffusion_models/	5.3GB
wan2.1_vae.pth	models/vae/	485MB
umt5_xxl_fp8_e4m3fn_scaled.safetensors	models/text_encoders/	6.1GB
open_clip_xlm_roberta_large_vit_huge_14.pth	models/clip/	4.5GB (for I2V)

Workflow

Check ComfyUI status (curl http://127.0.0.1:8188/system_stats).
Start if needed (systemctl --user start comfyui.service).
Call generate.py with appropriate mode and options.
Return output image/frames to user; offer xdg-open to view.
For video: frames are individual PNGs; optionally combine into MP4 with imageio.

Troubleshooting

libcudart.so not found: set LD_LIBRARY_PATH with nvidia/cuda_runtime/lib.
OOM on 16GB VRAM: reduce resolution or use lower --length for video.
Video generation slow: T2V-1.3B 49 frames ≈ 2-3 minutes on RTX 5080.
Server won't start: pkill -f "main.py" for stale processes.
All models downloaded from ModelScope (domestic) — HuggingFace inaccessible.