Install
openclaw skills install gemini-imageGenerate or edit images using Gemini API with multiple reference image support. Use for image generation, style transfer, combining references, UI mockups, or any visual creation task.
openclaw skills install gemini-imageGenerate images with support for multiple reference images (up to 14).
Required: GEMINI_API_KEY environment variable
Get your API key at: https://aistudio.google.com/apikey
Set it:
export GEMINI_API_KEY="your-key-here"
Or in your OpenClaw config (~/.openclaw/openclaw.json):
{
"skills": {
"entries": {
"gemini-image": {
"env": { "GEMINI_API_KEY": "your-key-here" }
}
}
}
}
# Generate from prompt
uv run {baseDir}/scripts/generate.py -p "a sunset over mountains" -o sunset.png
# Edit/transform one image
uv run {baseDir}/scripts/generate.py -p "make this pop art style" -i photo.png -o popart.png
# Combine multiple references (up to 14)
uv run {baseDir}/scripts/generate.py -p "combine style of first with content of second" \
-i style-ref.png -i content.png -o combined.png
# With aspect ratio
uv run {baseDir}/scripts/generate.py -p "wide landscape" -a "16:9" -o wide.png
# Higher resolution
uv run {baseDir}/scripts/generate.py -p "detailed portrait" -r 2K -o portrait.png
| Flag | Description |
|---|---|
-p, --prompt | Image prompt (required) |
-o, --output | Output filename (required) |
-i, --input-image | Reference image (repeatable, up to 14) |
-m, --model | pro (default), flash2, flash, or exp |
-r, --resolution | 1K (default), 512, 2K, 4K (pro/flash2) |
-a, --aspect-ratio | See ratios below (pro/flash2) |
-t, --thinking | minimal (default), high, dynamic (pro/flash2) |
gemini-3-pro-image-preview) — Highest quality, thinking mode, up to 14 refs, aspect ratio + resolutiongemini-3.1-flash-image-preview) — NEW: Nano Banana 2. Best price/perf, thinking levels, great text rendering, localization, aspect ratio + resolution + 512pxgemini-2.5-flash-image) — Older Flash, simpler config, no aspect ratio/resolution controlgemini-2.0-flash-exp) — Experimental, good for editsWhen to use flash2 vs pro:
1:1 · 2:3 · 3:2 · 3:4 · 4:3 · 4:5 · 5:4 · 9:16 · 16:9 · 21:9 · 4:1 · 1:4 · 8:1 · 1:8
Control how much the model reasons before generating:
# Complex scene with high thinking
uv run {baseDir}/scripts/generate.py -p "detailed city scene with specific text overlay" -m flash2 -t high -o city.png
| Model | Max Images | Notes |
|---|---|---|
| pro | 14 total | 5 high-fidelity, 6 objects, 5 humans |
| flash | 3 | Best with ≤3 inputs |
| exp | varies | Experimental |
For bulk jobs that can wait 24hr, use the batch API.
When user says "batch submit", follow this automated workflow:
batch.py submitCron job text template:
Check gemini-image batch [BATCH_ID]. If complete: download to [OUTPUT_DIR], send all images to [CHANNEL], then disable this cron job.
1. Create requests JSONL (one per line):
{"key": "sunset", "prompt": "a sunset over mountains", "aspect_ratio": "16:9", "resolution": "2K"}
{"key": "portrait", "prompt": "corporate headshot", "input_images": ["/path/to/ref.png"]}
2. Submit batch:
uv run {baseDir}/scripts/batch.py submit requests.jsonl
3. Check status (typically ready within 24hr):
uv run {baseDir}/scripts/batch.py status
4. Download when done:
uv run {baseDir}/scripts/batch.py download -o ./images/
When to use batch vs instant:
generate.py (instant, full price)batch.py (24hr, 50% off)See references/prompting.md for detailed strategies:
Generation:
Editing:
Core principle: Describe the scene narratively, don't just list keywords.
-i flags intentionally2026-01-26-sunset.pngMEDIA: line for Clawdbot auto-attachFor complex generations — especially UI mockups and wireframes — use a two-step workflow:
When user asks for a wireframe or complex image, show the structured prompt first instead of generating immediately:
Here's the prompt I'll use:
{
"image_type": "UI mockup",
"device": {"frame": "iPhone 16 Pro", "orientation": "portrait"},
"design_system": {
"style": "iOS 18 native",
"corners": "rounded, 16px radius",
"shadows": "soft drop shadows",
"spacing": "8pt grid",
"font": "SF Pro"
},
"layout": {
"header": "Navigation bar with back button and title 'Settings'",
"content": "List of settings items with icons and toggles",
"bottom": "Tab bar with 4 items"
}
}
Want me to generate this, or should I adjust anything first?
Convert JSON to narrative prompt and run:
uv run {baseDir}/scripts/generate.py \
-p "[narrative version of JSON]" \
-o mockup.png \
-a "9:16" \
-m pro
| Use Draft Mode | Skip Draft (Generate Direct) |
|---|---|
| UI mockups / wireframes (new) | Simple images ("sunset over mountains") |
| Complex multi-element scenes | Style transfer with clear reference |
| User says "let me see the prompt first" | User says "just make it" |
| Previous generation failed/missed the mark | Simple edits (color swap, minor tweaks) |
| Structural changes to existing image | — |
Trigger phrases for draft mode:
Edit iterations:
Quick example:
uv run {baseDir}/scripts/generate.py \
-p "Convert this wireframe to high-fidelity iOS 18 UI. iPhone 16 Pro frame, rounded corners, soft shadows, SF Pro font. Interpret: scribbles→images, rectangles→buttons, lines→text" \
-i wireframe-sketch.png \
-o mockup.png \
-a "9:16"
See references/prompting.md for detailed templates including JSON-structured prompts and multi-screen flows.