Gpt Image2

v0.3.1

Generate high-quality images with GPT Image 2 (OpenAI gpt-image-2) via the ClawdChat tool gateway. Use when the user asks to create / generate / draw / paint...

0· 31· 4 versions· 0 current· 0 all-time· Updated 1h ago· MIT-0
byAgentrix@lxyd-ai

Install

openclaw skills install gpt-image2

GPT Image 2 — High-quality AI image generation

Powered by ClawdChat — calls OpenAI gpt-image-2 through the Uno tool gateway.

What this skill does

Two thin command-line invocations against the public ClawdChat tool gateway:

Tool slugPurposeCost
gpt-image-2.gpt_image2_submitSubmit a generation job, returns job_id immediately (async)300 credits / call
gpt-image-2.gpt_image2_resultPoll job status / fetch image URL when ready0 credits

This skill ships no local Python code. It defers all credential, transport and rate-limit handling to the uno-cli companion skill.

Credentials & permissions (read before first use)

  • Credential type: ClawdChat API key (Bearer token).
  • Where it lives: ~/.uno/credentials.json. The file is created and owned by the uno-cli skill; this skill never opens, prints or copies it.
  • How it was obtained: the user runs python ../uno-cli/bin/uno.py login, which issues a device code. The user must first log in at https://clawdtools.uno (top-right "Login"), then open the authorisation URL shown in the terminal. The resulting API key is stored by uno-cli.
  • What it authorises: calling the ClawdChat tool gateway as the logged-in user. Each gpt_image2_submit deducts 300 credits from that account.
  • Network egress: the user's prompt text and any reference_image_urls are sent to the ClawdChat gateway over HTTPS. Do not paste private, confidential, or personally-identifying content into the prompt unless you are comfortable with the gateway's data handling — see https://clawdchat.cn for the data policy.
  • Logging out / revoking: run python ../uno-cli/bin/uno.py logout (credential file managed entirely by uno-cli).

Cost transparency & confirmation rule

Every gpt_image2_submit call costs the logged-in account real credits. The agent must:

  1. Show the user the planned prompt, size, style, and number of images before the first call.
  2. Ask for explicit confirmation when the user has not already approved a generation in the current turn.
  3. For multi-image batches (n > 1) or retries, treat each submission as a separate spending event and confirm again unless the user has pre-authorised the batch.
  4. On error responses, surface the error to the user instead of silently retrying.

Polling via gpt_image2_result is free; only submit spends credits.

Setup

This skill is a thin wrapper around the uno-cli companion skill, which ships the actual Python CLI (bin/uno.py).

1. Install uno-cli explicitly

The current clawhub install does not auto-cascade metadata.openclaw.skills dependencies, so you must install it yourself:

clawhub install uno-cli

After both skills are installed, the layout is:

<skills-root>/
├── gpt-image2/
│   └── SKILL.md
└── uno-cli/
    ├── SKILL.md
    └── bin/uno.py        # the actual CLI

From this skill's folder, the CLI is therefore reachable at the relative path ../uno-cli/bin/uno.py. All examples below use that path verbatim.

2. Log in

python ../uno-cli/bin/uno.py login --start

This prints a device code and a verification_uri_complete URL like https://clawdtools.uno/device?code=XXXX.

Open that URL in a browser — if you are not yet signed in to clawdtools.uno, the page will redirect you to the ClawdChat SSO login automatically and return to the authorisation screen afterwards. Click "Authorise" to complete the flow.

Then poll for completion:

python ../uno-cli/bin/uno.py login --poll <device_code>

Or run python ../uno-cli/bin/uno.py login (blocking, identical flow but polls automatically).

Credential storage (~/.uno/credentials.json) and refresh are handled entirely by uno-cli.

Why not call a global uno command?

Don't rely on a uno binary in PATH. On many systems (notably macOS with LibreOffice installed) /opt/homebrew/bin/uno is the LibreOffice UNO bridge, an unrelated Mach-O binary — invoking it will produce confusing C++ errors. Always invoke python ../uno-cli/bin/uno.py … (or an explicit absolute path), or set up your own alias / symlink that you control.

If python resolves to Python 2 on the host, use python3 instead.

Generating an image — full async flow

A single 1024×1024 image typically takes ~150 s, longer than the default MCP 60 s timeout. Always use the submit → poll-result pattern.

Step 1 — submit

python ../uno-cli/bin/uno.py call gpt-image-2.gpt_image2_submit --compact \
  --args '{"prompt":"A shiba inu under cherry blossoms, sunny afternoon","size":"1024x1024","style":"ghibli_anime"}'

Response (already flattened by uno-cli — no need to unwrap content[0].text):

{"success": true, "data": {"status": "pending", "job_id": "0b84b8f0f0c8", "estimated_seconds": 150}, "meta": {"latency_ms": 120, "credits_used": 300}}

Record data.job_id.

Step 2 — poll for result

python ../uno-cli/bin/uno.py call gpt-image-2.gpt_image2_result --compact --timeout 70 \
  --args '{"job_id":"0b84b8f0f0c8","wait_seconds":50}'

wait_seconds=50 makes the server-side wait 50 s (within the 60 s MCP envelope); --timeout 70 adds a small client buffer.

Repeat the call until data.status is one of:

  • done — image ready, URLs in data.items[].url.
  • error — generation failed, message in data.error.
  • pending / running — call again immediately. Do not add a client-side sleep; the server already waited 50 s on your behalf.

Three to five iterations (~150–250 s total) is normal.

Reference shell loop

UNO=../uno-cli/bin/uno.py

RESP=$(python "$UNO" call gpt-image-2.gpt_image2_submit --compact \
  --args '{"prompt":"Van Gogh starry night","style":"oil_painting_vangogh"}')
JOB_ID=$(echo "$RESP" | python3 -c "import json,sys; print(json.load(sys.stdin)['data']['job_id'])")

for i in 1 2 3 4 5 6; do
  R=$(python "$UNO" call gpt-image-2.gpt_image2_result --compact --timeout 70 \
    --args "{\"job_id\":\"$JOB_ID\",\"wait_seconds\":50}")
  STATUS=$(echo "$R" | python3 -c "import json,sys; print(json.load(sys.stdin)['data']['status'])")
  [ "$STATUS" = "done" ]  && echo "$R" && break
  [ "$STATUS" = "error" ] && echo "$R" && exit 1
done

Parameters

FieldMeaningValues
promptImage description (required, any language)free text
sizeImage dimensions1024x1024 (default), 1024x1536 (portrait), 1536x1024 (landscape), auto
nNumber of images to generate1–4 (default 1)
styleBuilt-in style presetone of the 20 keys below
reference_image_urlsReference images (image-to-image)URL string, comma-separated for multiple

20 built-in style presets

keydescription
ghibli_animeStudio Ghibli / hand-drawn anime
pixar_3dPixar / Disney 3D animation
claymationStop-motion claymation (Laika / Aardman)
lego_brickLEGO bricks
popmart_figurineBlind-box / Pop Mart figurine
isometric_gameIsometric 2.5D game scene
cinematic_photoCinematic photorealism (35mm)
polaroid_filmPolaroid film snapshot
watercolor_inkWatercolour / East-Asian ink wash
oil_painting_vangoghVan Gogh impasto oil painting
cyberpunk_neonCyberpunk neon nightscape
vintage_infographicRetro infographic / data poster
movie_posterMovie poster (large title + still)
flat_vectorFlat-vector illustration / banner
pixel_8bitPixel art (8/16-bit)
papercraft_layeredLayered papercraft
exploded_diagramExploded technical diagram
dreamcore_liminalDreamcore / liminal space
knolling_flatlayTop-down knolling / flat-lay
botanical_engravingBotanical engraving / antique illustration

Where this model shines (vs Midjourney / Flux / SD)

  • Accurate text rendering — poster headlines, infographics, menu typography, meme captions: written into the image as specified.
  • Strong prompt following — multi-element scenes, ordering and spatial relationships obeyed.
  • Subject preservation in image-to-image — faces, brands, and characters stay consistent across reference images.
  • Wide style coverage — Ghibli, Pixar, claymation, LEGO, Pop Mart, botanical engraving etc. all handled.

Agent guidance

  • Tell the user up-front that one image takes ~150 s.
  • The gpt_image2_result tool already sleeps 50 s server-side — never add an extra client-side sleep between polls.
  • Use --timeout 70 for result calls (50 s server wait + buffer).
  • Pass the user's prompt verbatim, including non-English text.
  • Reference images: combine reference_image_urls with a style preset for "restyle while keeping the subject".
  • Posters / infographics / menus: lean on the text-rendering strength.
  • If submit returns success=false, surface the error/hint fields to the user.
  • If the loop exhausts (~600 s) and status is still running, tell the user the job can be re-polled later with the same job_id.

Response shape

Already flattened by uno-cli:

{
  "success": true,
  "data": {"status": "...", "job_id": "...", "items": [{"url": "..."}]},
  "meta": {"latency_ms": 120, "credits_used": 300}
}

Read data.status, data.job_id, data.items[].url directly.

Errors:

{"success": false, "error": "...", "hint": "..."}

Version tags

latestvk973a09s8tdq6117r902gmat8n85v7aj