ComfyUI Image & Video Generation

Other

Generate images and videos via ComfyUI on local GPU. Supports Flux text-to-image, Wan2.1 text-to-video, and image-to-video.

Install

openclaw skills install comfyui-image-video

ComfyUI — Image & Video Generation

Use to generate images (Flux schnell) and videos (Wan2.1 T2V/I2V) on the local RTX 5080 GPU.

Environment

  • ComfyUI: ~/ComfyUI (systemd user service: comfyui.service)
  • Python venv: ~/comfyui-venv
  • API: http://127.0.0.1:8188
  • Output: ~/ComfyUI/output/

Script

{baseDir}/scripts/generate.py <mode> [options]

Mode: image — Text-to-Image (Flux schnell)

{baseDir}/scripts/generate.py image \
  --prompt "A cat on the moon" \
  --output /tmp/output.png
OptionDefaultDescription
--prompt(required)Text prompt
--negative""Negative prompt
--width1024Image width
--height1024Image height
--steps4Sampling steps (schnell optimized)
--seedrandomReproducible seed
--outputComfyUI output dirCopy output here
--modelflux1-schnell.safetensorsUNET filename
--weight-dtypefp8_e4m3fnWeight quantization
--wait120Max wait seconds

Recommended Flux schnell params: steps=4, cfg=1.0, sampler=euler, scheduler=simple

Mode: t2v — Text-to-Video (Wan2.1 T2V-1.3B)

{baseDir}/scripts/generate.py t2v \
  --prompt "A red sports car driving on a mountain road at sunset" \
  --length 49 \
  --output /tmp/video_frames/
OptionDefaultDescription
--prompt(required)Text prompt
--negative""Negative prompt
--width832Frame width
--height480Frame height
--length49Number of frames (≈3s at 16fps)
--steps20Sampling steps
--seedrandomReproducible seed
--outputComfyUI output dirCopy frames here
--wait300Max wait seconds

Recommended Wan2.1 T2V params: steps=20, cfg=5.0, sampler=uni_pc_bh2, scheduler=simple

Mode: i2v — Image-to-Video (Wan2.1 I2V using T2V-1.3B)

{baseDir}/scripts/generate.py i2v \
  --prompt "gentle wave motion, water flowing" \
  --image /path/to/input.png \
  --output /tmp/video_frames/
OptionDefaultDescription
--prompt(required)Motion description
--image(required)Path to input image
--length49Number of frames
--steps20Sampling steps
--seedrandomReproducible seed
--outputComfyUI output dirCopy frames here
--wait300Max wait seconds

Server Management

# Start (systemd user service)
systemctl --user start comfyui.service

# Check status
systemctl --user status comfyui.service

# Check API
curl -s http://127.0.0.1:8188/system_stats | python3 -m json.tool

# Manual start (if systemd not available)
cd ~/ComfyUI && LD_LIBRARY_PATH=~/comfyui-venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:$LD_LIBRARY_PATH ~/comfyui-venv/bin/python main.py --listen 127.0.0.1 --port 8188

Installed Models

Image (Flux)

FileLocationSize
flux1-schnell.safetensorsmodels/unet/23.8GB
ae.safetensorsmodels/vae/335MB
clip_l.safetensorsmodels/clip/250MB
t5xxl_fp16.safetensorsmodels/clip/9.8GB

Video (Wan2.1)

FileLocationSize
wan2.1_t2v_1.3B_bf16.safetensorsmodels/diffusion_models/5.3GB
wan2.1_vae.pthmodels/vae/485MB
umt5_xxl_fp8_e4m3fn_scaled.safetensorsmodels/text_encoders/6.1GB
open_clip_xlm_roberta_large_vit_huge_14.pthmodels/clip/4.5GB (for I2V)

Workflow

  1. Check ComfyUI status (curl http://127.0.0.1:8188/system_stats).
  2. Start if needed (systemctl --user start comfyui.service).
  3. Call generate.py with appropriate mode and options.
  4. Return output image/frames to user; offer xdg-open to view.
  5. For video: frames are individual PNGs; optionally combine into MP4 with imageio.

Troubleshooting

  • libcudart.so not found: set LD_LIBRARY_PATH with nvidia/cuda_runtime/lib.
  • OOM on 16GB VRAM: reduce resolution or use lower --length for video.
  • Video generation slow: T2V-1.3B 49 frames ≈ 2-3 minutes on RTX 5080.
  • Server won't start: pkill -f "main.py" for stale processes.
  • All models downloaded from ModelScope (domestic) — HuggingFace inaccessible.