Install
openclaw skills install image-with-comfyuiCall a local ComfyUI instance for text-to-image (T2I), image-to-image/edit (I2I), and image-to-video (I2V) generation. Supports Z-Image, SD3.5 Medium, Qwen Image Edit, and Wan2.2 models with automatic prompt formatting and VRAM purge after run.
openclaw skills install image-with-comfyuiCall a local ComfyUI server to generate or edit images and videos. Four modes:
Detection rules:
Action:
image_with_comfyui.py i2i or wan2.2 accordingly--image input--promptContext tracking:
Examples:
[image: a photo of a dog] → Agent: (wait)[text: change the background to a beach] → Agent: calls i2i --image <path> --prompt "change the background to a beach"[image: a cat sitting on a chair] → Agent: (wait)[text: make it stand up and walk] → Agent: calls wan2.2 --image <path> --prompt "the cat stands up and walks"Read config.json relative to this SKILL's directory. All values can be overridden by environment variables:
| Env Variable | Overrides | Default |
|---|---|---|
COMFYUI_URL | comfyui_url | http://localhost:8188 |
COMFYUI_TIMEOUT | timeout_seconds | 120 |
COMFYUI_POLL_INTERVAL | poll_interval_seconds | 3 |
COMFYUI_OUTPUT_DIR | output_dir | /tmp/comfyui_output |
OPENCLAW_WORKSPACE | workspace_root | OpenClaw workspace dir |
| Mode | Workflow | Location |
|---|---|---|
| T2I (Z-Image) | Z-Image T2I | workflows/z-image_t2i_api.json |
| T2I (SD3.5) | SD3.5 Medium T2I | workflows/sd3.5-med_t2i_api.json |
| I2I | Qwen Image Edit | workflows/qwen_image-edit_api.json |
| I2V | Wan2.2 Image-to-Video | workflows/wan2.2_i2v_api.json |
The system automatically handles two types of errors:
When a workflow references a custom node that isn't installed, the system detects it and reports:
Example: If ImpactKSamplerBasicPipe is missing:
⚠️ Missing node: `ImpactKSamplerBasicPipe`
📦 Package: ComfyUI-Impact-Pack
🔗 GitHub: https://github.com/ltdrdata/ComfyUI-Impact-Pack
ℹ️ Install manually: cd ComfyUI/custom_nodes && git clone https://github.com/ltdrdata/ComfyUI-Impact-Pack
When a workflow references a model file that doesn't exist, the system attempts to find a compatible substitute:
| Requested Model | Substitute |
|---|---|
sd3.5_medium variants | sd3.5_large.safetensors |
| WAN High → Low or vice versa | Swap between variants |
| Other unknown models | No substitution (error returned) |
Example: If my_custom_sd3_medium_v2.safetensors is missing:
⚠️ Model missing: `my_custom_sd3_medium_v2.safetensors`
🔄 Substituted: `sd3.5_large.safetensors`
📦 Loader: CheckpointLoaderSimple.ckpt_name
After substitution, the workflow is retried automatically with the substitute model.
When the workflow references UnloadAllModels (a memory cleanup node) which isn't available, the system automatically bypasses it by rerouting the signal path:
UnloadAllModels nodeExample:
⚠️ Workflow missing node: `UnloadAllModels` (memory cleanup, non-critical)
🔄 Auto-bypassed — generation continues
message tool using media or filePath.Z-Image works best with structured natural language prompts, not keyword spam.
6-part formula:
Subject + Scene + Composition + Lighting + Style + Constraints
Rules:
(word:1.2)Example:
A young woman with long wavy blonde hair sits at a wooden café table,
steam rising from a ceramic cup. Shot from a 3/4 angle, close-up framing.
Soft morning light filters through sheer curtains, casting warm golden tones.
Cinematic photography, shallow depth of field, Kodak Portra 400 aesthetic.
No text, no logos, photorealistic skin texture.
Aspect ratios: 1:1, 4:3, 3:4, 16:9, 9:16, 3:2, 2:3
SD3.5 Medium uses natural language prompts with optional negative prompts.
Prompt formula:
[Composition/Angle] + [Subject] + [Scene/Environment] + [Lighting/Color] + [Style/Texture] + [Details]
Rules:
--negative for elements to exclude--aspect to change--seed for reproducibilitybeautiful, amazing, 4k)(word:1.2) — SD3.5 doesn't recognize themParameter recommendations:
Common negative prompt words:
blurry, low quality, pixelated, grainy,
overexposed, underexposed, flat lighting,
text, watermark, logo, signature, caption,
poorly drawn face, deformed, mutated, disfigured, extra limbs,
cartoonish (when realism is wanted)
Chinese example:
上海魔都春日花海 — 黄浦江畔,大片郁金香、樱花、油菜花盛开,繁花似锦,
春日和煦阳光,远景陆家嘴三件套天际线,湿润的滨江步道倒映花影,
低饱和胶片色调,文艺清新,广角视野
English example:
Cinematic photography, wide-angle shot of a bustling Tokyo street at night,
neon signs reflecting on wet pavement, people with transparent umbrellas,
moody atmospheric lighting, deep blues and vibrant reds, street photography,
shallow depth of field with bokeh background
I2I prompts must be concise and direct. Keep the user's original language.
Rules:
Prompt routing fix (2026-04-22):
TextEncodeQwenImageEditPlus nodes:
115:110 — empty negative prompt node115:111 — positive prompt node (contains default text like "the girl")prepare_i2i_workflow() function auto-detects by scanning for existing default textWan2.2 generates short videos (~5 seconds) from a static image + motion description.
Rules:
✅ Prompt describes actions/movement (not scene description)
✅ Write motion description in English for best results
✅ Focus on "who does what" and "how the camera moves"
❌ Don't describe static scene elements in motion prompt
Default: 81 frames (~5s @ 16fps), 4 steps, CFG 4.5
Base resolution: 560×720 (3:4, fast and OK quality)
Auto-detect input image aspect ratio and select reference resolution:
| Aspect | Fast & OK | User Fav | WAN 2.2 Native |
|---|---|---|---|
| 3:4 | 560×720 | 720×912 | 848×1088 |
| 2:3 | 528×768 | 656×960 | 784×1136 |
| 9:16 | 480×848 | 608×1072 | 720×1264 |
Other available resolutions:
Examples:
prompt: "the cat walks forward and looks at the camera, tail wagging"
prompt: "the girl smiles and turns her head, wind blowing her hair"
prompt: "the person stands in a busy street, camera pans left and slowly zooms in, cars driving, red flag fluttering"
# Z-Image (default model)
python3 image_with_comfyui.py t2i \
--prompt "Your detailed image description" \
--aspect 16:9 \
--steps 9
# SD3.5 Medium
python3 image_with_comfyui.py sd35 \
--prompt "A beautiful sunset over mountains" \
--aspect 16:9 \
--negative "text, watermark, blurry" \
--steps 20 \
--cfg 5.5
python3 image_with_comfyui.py i2i \
--prompt "Change background to a beach" \
--image /path/to/source.jpg \
--steps 4
python3 image_with_comfyui.py wan2.2 \
--prompt "the person walks forward and smiles" \
--image /path/to/source.jpg \
--length 81 --steps 4
python3 image_with_comfyui.py test
Send the media attachment directly. Be minimal.
⚠️ Absolutely forbidden: Only writing text descriptions without actually sending files!
✅ Correct approach: Send MEDIA path with the appropriate prefix for the current Channel
| Channel | Format | Example |
|---------|--------|---------||
| WhatsApp | MEDIA:./image.jpg | MEDIA:./angel_video.mp4 |
| Telegram | MEDIA: or filePath: | Varies by implementation |
| Discord | Direct attachment | Varies by implementation |
Rules:
MEDIA: prefix + relative path./media/outbound/ directory to ensure accessibilitycp the file to ~/.openclaw/media/outbound/, then send via MEDIA:MEDIA: line must be the sole content of the message, with no [[reply_to_current]], text, or anything else before it. Otherwise WhatsApp splits the text and attachment into two separate messages, making it look like "sent twice". If caption text is needed, use MEDIA:./file.ext caption=description format.WhatsApp:
MEDIA:./output_image.png
Universal:
[📎 Image attachment via MEDIA prefix]
Never replace actual file sending with text descriptions!
| Model | Timeout |
|---|---|
| T2I (Z-Image) | 100s |
| SD3.5 Medium | 100s |
| I2I (Qwen) | 600s |
| I2V (Wan2.2) | 1000s |