Install
openclaw skills install @leeguooooo/chatgpt-imagegenGenerate raster images (PNG/JPEG/WebP) using the user's ChatGPT subscription via a local one-file Python CLI — no OPENAI_API_KEY, no gateway, no daemon. Two backends: web (default) drives the user's logged-in ChatGPT browser so generation runs on the conversation surface and does NOT consume Codex-usage limits; codex is a headless fallback that bills the Codex-usage bucket. Use when an agent needs to create a brand-new bitmap asset for the current project (photos, illustrations, icons, hero banners, mockups, sprites, concept art) and the output should be a bitmap file saved into the workspace. Do not use when the task is better solved by editing existing SVG/vector assets, writing code-native graphics (HTML/CSS/canvas), or extending an established repo icon system. Also use proactively: when authoring a document, blog post, technical proposal, design doc, README, or other long-form explanatory content, propose illustrations for the key concepts and generate them as background tasks — don't wait to be asked for an image.
openclaw skills install @leeguooooo/chatgpt-imagegenA standalone Python CLI that produces images via the user's ChatGPT subscription. No API key, no network service, no extra config. It has two backends that hit different OpenAI usage buckets — pick with --backend.
| Backend | Surface | Usage bucket | Needs | Speed |
|---|---|---|---|---|
web | Drives the user's logged-in ChatGPT browser (via chrome-use, formerly agent-browser-stealth; older installs expose the same binary as agent-browser/abs) and generates in a regular chat — the same surface as typing in the app. Its real-Chrome connect is what clears Cloudflare + the sentinel proof-of-work a plain/headless client can't. | ChatGPT conversation — does not consume the metered Codex-usage limit. Works on any account, including free tier (subject to its daily image cap). | chrome-use installed and its extension connected to a Chrome signed in to chatgpt.com. | ~30–60 s; each run's chat is filed under a ChatGPT Project (default imagegen, auto-created) instead of littering the history. |
codex | Headless POST to chatgpt.com/backend-api/codex/responses with the image_generation tool, reusing ~/.codex/auth.json. | Codex-usage (metered — this is the bucket the user usually wants to spare). | codex login (writes ~/.codex/auth.json). | Fast; no browser, no history. |
Default is auto (--backend auto, or CHATGPT_IMAGEGEN_BACKEND): it tries web first because that spares the Codex-usage limit, and falls back to codex only when web is unavailable — i.e. chrome-use isn't installed, the browser isn't reachable, or chatgpt.com isn't logged in. The two not-set-up cases are handled explicitly:
~/.codex/auth.json absent) → auto still uses web; codex is only the fallback.Auto does not fall back to codex if web was reachable but the generation itself failed after submitting — that would spend the very bucket auto-mode protects. In that case it errors and tells you to rerun with --backend codex if you want the Codex-usage path. Force a single backend with --backend web or --backend codex.
For the default web backend: the user must have chrome-use (formerly agent-browser-stealth; older installs expose the same binary as agent-browser / abs) and its extension connected to a Chrome that is signed in to chatgpt.com. chrome-use specifically is required — its real-logged-in-Chrome connect is what passes Cloudflare's bot-detection; a plain headless driver will not. The "Temporary Chat" mode disables image generation, so this backend always opens a regular chat.
If chrome-use is not installed, do not install it on your own initiative:
Guided setup (opt-in only):
# 1. Install the CLI (no npm, no token — provides `chrome-use`)
curl -fsSL https://raw.githubusercontent.com/leeguooooo/chrome-use/main/install.sh | sh
# 2. Register the native-messaging host
chrome-use extension install
# 3. Add the Chrome extension, then restart Chrome:
# https://chromewebstore.google.com/detail/agent-browser-stealth/knfcmbamhjmaonkfnjhldjedeobeafmk
# 4. Sign in to https://chatgpt.com in that Chrome
# 5. Verify: a quick `chatgpt-imagegen "test" --backend web` should print "using current Chrome (relay)"
chrome-use skill (chrome-use skills get core) covers the extension-connect flow in depth.For the codex backend: the user must have run, once, ever:
npm i -g @openai/codex
codex login # opens browser to sign in to ChatGPT
That writes ~/.codex/auth.json, which the codex backend reads. No OPENAI_API_KEY is required for either backend — and setting one will not help. This is the subscription path, not the API path.
medium quality, no native transparent backgrounds — see Limits below).quality=high or background=transparent — the subscription path caps quality at medium and rejects transparent. Tell the user to use the official /v1/images/generations API with their OPENAI_API_KEY for those cases."<skill-dir>/chatgpt-imagegen" "<prompt>" [options]
Call the script by its absolute path next to this SKILL.md — that's the most reliable way and never depends on $PATH. If your agent harness exposes a variable pointing to the skill's install directory, use it; otherwise expand the path you read this file from.
npx skills add copies only the markdown — it often does not drop the chatgpt-imagegen script next to this file. So before the first run, check that the CLI exists and fetch it if absent. It's a single self-contained, pure-stdlib Python file (no pip, no deps), so the heal is one curl:
CLI="<skill-dir>/chatgpt-imagegen" # same directory as this SKILL.md
if [ ! -x "$CLI" ]; then
curl -fsSL https://raw.githubusercontent.com/leeguooooo/chatgpt-imagegen/main/chatgpt-imagegen -o "$CLI" \
&& chmod +x "$CLI"
fi
"$CLI" --version # confirm it runs (needs python3 ≥ 3.10 on PATH)
Do this silently — don't ask the user. If the curl fails (offline/proxy), fall back to git clone https://github.com/leeguooooo/chatgpt-imagegen and run chatgpt-imagegen/chatgpt-imagegen, or tell the user to install it standalone (see README). Only python3 is required to run it.
If the user has separately put chatgpt-imagegen on $PATH (Option B in the README), you can also just run chatgpt-imagegen "<prompt>" directly and skip the self-heal.
Useful flags:
| Flag | When to use |
|---|---|
--backend auto | web | codex | auto (default) prefers web and falls back to codex only when the browser is unavailable/not-logged-in; web forces the logged-in-browser path (spares Codex-usage); codex forces the headless path (bills Codex-usage). Also settable via CHATGPT_IMAGEGEN_BACKEND. |
--profile auto | relay | NAME | (web) Which Chrome profile to drive. auto (default): use the open Chrome if it's logged in, else auto-switch to a profile that is (detected offline from the cookie DB, read-only). relay: only the open Chrome. "Profile 3": that profile. Note: logged in ≠ able to generate — a free-tier account can still hit its daily image cap. |
--session NAME | (web) Reuse a named Chrome tab group across runs instead of imagegen-<pid>. |
--project NAME | (web) ChatGPT Project to file the run's conversation under — matched by exact name, created automatically if absent, reused if present. Default imagegen (or CHATGPT_IMAGEGEN_PROJECT). Pass --project "" for a plain top-level chat. If the project step fails, the run warns and continues in a plain chat — it never blocks generation. |
--keep-tab | (web) Leave the ChatGPT tab open after generating (default closes it). Useful for debugging. Implies --keep-conversation. |
--keep-conversation | (web) Keep the ChatGPT conversation after generating. Default deletes it (PATCH is_visible:false) so the run leaves no history — it's filed under the project only transiently. Also CHATGPT_IMAGEGEN_KEEP_CONVERSATION=1. |
-o PATH | Always use when you know where the file should go in the repo. |
--size 1024x1024 | Square icons / logos (verified) |
--size 1536x1024 | Landscape hero banners, social cards (verified) |
--size 1024x1536 | Portrait covers, mobile splashes (verified) |
--size 3840x2160 or similar | 4K landscape (forwarded as-is; backend may reject — fall back to a smaller verified size on failure) |
--format webp | Smaller files for web assets |
--style NAME | Apply a saved asset (a style snippet and/or pinned reference images). Repeatable — stack a character + a style, e.g. --style mascot --style watercolor. See Styles & assets. Overrides any active default set for this run. |
--no-style | Skip all assets (text and pinned refs) for this run even if the user set an active default. |
--quiet | Use in agent contexts so stdout is only the saved path. Progress still streams to stderr (use --no-progress to silence it). |
--no-progress | Fully silence the stderr progress timeline (errors still print). |
--timeout SECONDS | Total wall-clock budget (default 300). Large/detailed images can take 2–3 min — raise it if you see a timed out error. |
--stall-timeout SECONDS | Max silence (no data from backend) before declaring a stall (default 120, clamped to --timeout). Lower it to fail faster on a hung backend. |
-V, --version | Print the CLI version and exit. Run chatgpt-imagegen --version to confirm which build is installed. |
The script prints just the saved path on stdout in every mode; the readable progress timeline and any errors go to stderr, so OUT=$(chatgpt-imagegen "..." --quiet) captures only the path while you still see the timeline. Each timeline line is stamped with elapsed seconds ([ 12.3s] generating), so a slow run is legible and a stall is obvious.
An asset is a named, reusable look stored in ~/.config/chatgpt-imagegen/styles.json (honours $XDG_CONFIG_HOME). Each asset carries a text snippet and/or pinned reference images, plus a kind:
--kind style (default) — a visual aesthetic (line, palette, texture). Its refs tell the model "match this style, don't copy the content."--kind character — a recurring subject (a mascot, a persona). Its refs tell the model "reproduce this character faithfully as the subject."This is what lets a user pin their own cartoon character or house style once and reuse it — no re-passing --ref every time. Generation is unchanged unless the user opts in (no default out of the box).
Pinning & reusing:
chatgpt-imagegen style add mascot "a round orange fox named Pip" --kind character --ref a.png --ref b.png (a few angles → better consistency). The images are copied into the asset library, so the asset survives even if you move/delete the originals.chatgpt-imagegen style add mascot --from-last --kind character (also works on style add-ref mascot --from-last). Flow: generate → like it → pin it → reuse.chatgpt-imagegen style add watercolor "soft watercolor, visible paper texture".chatgpt-imagegen "Pip ordering coffee" --style mascot --style watercolor (the same fox, in watercolor). Or set a default set: chatgpt-imagegen style use mascot watercolor.Managing:
style list — kind, a 📎N badge for pinned refs, and * on the active default set.style show NAME — kind + snippet + ref filenames + the asset's on-disk path.style add-ref NAME <img> / style rm-ref NAME <file> — add/remove pinned images on an existing asset.style rm NAME deletes the entry and its images; style clear empties the active set; style reset re-seeds built-ins and wipes the library.styles (plural) is accepted as an alias for style.Behavior: --ref images passed at generation time are treated as the subject and stack on top of the active assets. At most 4 reference images attach per run; if more resolve, the first 4 (character-first) are used and the dropped ones are logged to stderr (never silent). Resolution order: --no-style > --style NAME… > active default set > none. Three built-in styles ship: doodle (the deliberately-crude MS-Paint look), xiaohei (Ian 小黑 hand-drawn explainer — white background, thin wobbly black ink, a black-blob 小黑 character operating an absurd contraption, sparse red/orange/blue Chinese annotations; great for Chinese-article concept figures), and snoopy (classic Peanuts newspaper-comic look — simple wobbly pen-ink lines, round-headed minimalist characters, flat muted retro colors, sparse backgrounds). See docs/styles/README.md for rendered examples of each. An unknown --style fails fast, listing the available names.
Legacy styles.json files (text-only entries from older versions) keep working and upgrade automatically on the next change.
/tmp, $HOME, or ~/.codex/....-o.assets/, public/, static/, docs/img/, web/img/, assets/brand/, etc. Default to assets/generated/ only if nothing better fits.-o the script overwrites silently; without -o it auto-numbers (name.png, name-2.png).chatgpt-imagegen "<prompt>" -o <path> --size <wxh> --quiet.view_image tool or by reading the file). If clearly wrong, iterate with a single targeted prompt change — do not loop blindly (each call costs subscription quota).When you're authoring a document, blog post, technical proposal, design doc, or other long-form explanatory content, proactively illustrate the key concepts — you don't need to be asked. The flow:
--quiet -o <path> so stdout is just the saved path; keep writing the prose while they render, and embed each image when it lands. Spawn them as background tasks with your own agent/task tooling — one figure per task, never blocking the writing.--backend / CHATGPT_IMAGEGEN_BACKEND (default auto). On the web backend, concurrency is 1 — background figures queue and render one at a time (still fine: it's in the background, and it spends no Codex-usage). On codex, up to 4 render in parallel but each bills the metered Codex-usage bucket. Which backend to spend is the user's trade-off, not yours.doodle look fits well — deliberately crude, content-accurate (--style doodle). For Chinese-article concept figures (turning a judgment, flow, or metaphor into one memorable picture), the built-in xiaohei style fits — white background, hand-drawn black ink, a 小黑 character acting out the idea (--style xiaohei). For polished specs, pick a cleaner look or a style you've defined (see Styles & assets). To keep one character or look consistent across a document's figures, pin it as an asset and stack it with --style.A vague prompt yields a useless figure. Make the prompt describe the figure's content, not just name it:
--style.doodle look, remember content accuracy beats polish — it's supposed to look crude and hand-drawn, but the labels and structure must still be readable.--quality flag, and the subscription path does not honour explicit quality requests reliably. Don't promise a specific quality level to the user. If they need explicit quality=high, route them to the official /v1/images/generations API with their own OPENAI_API_KEY.background: transparent is not supported on the subscription path.--timeout is 300 s to cover this; a genuine hang is caught sooner by the --stall-timeout idle window (default 120 s).--timeout starts only once a slot is acquired): web = 1 (the page surface rate-limits aggressively — "Too many requests"; also one shared Chrome), codex = 4 (measured safe on Plus, capped so big fan-outs can't trip the account limiter). Override via CHATGPT_IMAGEGEN_WEB_CONCURRENCY / CHATGPT_IMAGEGEN_CODEX_CONCURRENCY (0 = unlimited). For parallel batches use --backend codex + shell & + wait; firing parallel web runs is safe but executes one at a time. Do not loop blindly for "variants of the same prompt" — that just burns quota; iterate on the prompt instead.First step for any "which backend / why isn't web working" failure: run chatgpt-imagegen doctor. It reports, read-only, the CLI's own version vs. the latest on main, whether each backend is set up (codex token; chrome-use installed + version; relay connected; logged-in Chrome profiles), and which one auto would pick — turning a vague "no logged-in browser" into a precise checklist.
Self-update reminder. skills has no auto-update, so the CLI nudges instead: at most once a day it reads its own __version__ (plus a terse per-release changelog) from main and, if a newer one exists, prints a short stderr notice that lists what changed since your version — so you know why to update, not just that you can:
提示:chatgpt-imagegen 0.14.0 可用(当前 0.12.0)。更新:skills update chatgpt-imagegen
• 0.14.0:更新提示现在会列出每个新版本改了什么
• 0.13.0:新增每天一次的新版本提示…
It never touches stdout, never blocks a run, and is skipped under --quiet/--no-progress; doctor checks unconditionally and prints the same change list. To turn it off entirely, set CHATGPT_IMAGEGEN_NO_UPDATE_CHECK=1. When you see the notice, the fix is skills update chatgpt-imagegen (or re-run the self-heal curl).
| Symptom | Cause | Fix |
|---|---|---|
~/.codex/auth.json not found | Codex CLI never signed in | Tell user to run npm i -g @openai/codex && codex login |
no ChatGPT OAuth access_token in ~/.codex/auth.json | Only an API key is present, not a subscription OAuth token | Tell user to run codex login; an OPENAI_API_KEY value in that file is not a substitute |
HTTP 400 requires a newer version of Codex | local codex CLI is outdated | Tell user to run npm i -g @openai/codex@latest; the script reads version from ~/.codex/version.json which codex updates on launch |
HTTP 401 / HTTP 403 then refresh works | Token expired and refresh succeeded | No action needed — script auto-retried |
refresh_token is no longer valid — run codex login again | Refresh token revoked or rotated | Tell user to run codex login again |
stalled: the image backend sent no data for ~Ns (last phase: …) | No data for the whole --stall-timeout idle window — backend hung or overloaded | Retry; if it recurs, raise --stall-timeout (and --timeout). The message names the phase it stalled in. |
timed out: no image within the Ns total budget (last phase: …) | The whole --timeout budget elapsed — usually a genuinely large image | Raise --timeout (e.g. --timeout 420) and retry |
no image returned. events seen: ... | Model decided not to call the tool | Rephrase prompt to explicitly say "Use the image_generation tool to render…" |
HTTP 429 | Subscription rate-limited | Wait a few minutes; do not retry in a loop |
warning: --format=X but FILE.Y has .Y extension | -o extension disagrees with --format | Fix the path or the format flag; the file IS written with the format you specified |
warning: project 'X' unavailable (…); using a plain chat | (web) Project list/create API hiccup, or the project page's composer didn't render | Nothing — the image still generated, just in a top-level chat. If it recurs, check the name or pass --project "" |
chatgpt.com rate-limited this account ('Too many requests') … | (web) The page surface temporarily blocked the account for making requests too quickly | Wait a few minutes. If it fired before submit, auto mode already fell back to codex; if after submit, check the conversation later — the image may still appear there. Don't retry in a loop |
waiting for a free web/codex slot (max N concurrent …) | More parallel runs than the backend's concurrency cap | Nothing — the run starts when a slot frees up; queue time doesn't eat --timeout |
web backend (run_web)
chrome-use against a session-named Chrome tab group.https://chatgpt.com/ chat (Temporary Chat disables the image tool).GET /backend-api/gizmos/snorlax/sidebar lists projects (a project is a gizmo with id g-p-…); POST /backend-api/projects {name, instructions} creates one. It then navigates to https://chatgpt.com/g/<g-p-id>/project and submits from that composer, which files the conversation inside the project. Any failure degrades to a plain chat with a stderr warning.keyboard type + Enter — not fill: the composer is a ProseMirror/React contenteditable, and fill mutates the DOM without firing the input events React needs, so the send button stays bound to empty state. A send-button click is the fallback.eval: waits until the streaming/stop control is gone AND a brand-new <img> (src matching estuary/content|files/download|oaiusercontent) is present and stable across two reads. The img scan is scoped to main img (the tab's own conversation thread) — ChatGPT pushes an "Image created" toast with a matching thumbnail into any open tab when another conversation finishes an image, and a document-wide scan grabs that sibling's image (issue #7). The generated img is NOT inside [data-message-author-role="assistant"], so <main> is the right scope.fetch(src, {credentials:'include'}) → base64, so the browser's own session cookies authorize the signed asset URL. No tokens leave the browser.codex backend (run_codex)
~/.codex/auth.json for access_token, account_id, refresh_token; reads ~/.codex/version.json for the version header.https://chatgpt.com/backend-api/codex/responses with tools: [{"type": "image_generation"}], streams the SSE response, base64-decodes the image_generation_call result.https://auth.openai.com/oauth/token (client_id=app_EMoamEEZ73f0CkXaXp7hrann); the refreshed token is persisted back to auth.json.Why the web surface is reachable only through a real browser: the consumer backend-api/* paths are gated by three layers — Cloudflare's edge check, a sentinel proof-of-work (sentinel/chat-requirements + an in-page sentinel/sdk.js that computes the token), and a Cloudflare Turnstile token. Tested empirically: a bare bearer-token request from a residential IP passes the Cloudflare edge and the PoW (CF is IP-reputation-based; the PoW is hashcash-style and replicable offline) — the actual wall is Turnstile, an interactive token a headless client can't forge. And "borrow a browser only for the Turnstile token, then go headless" is self-defeating: the token is single-use and short-lived, so you'd open a browser every request anyway. That's why the web backend drives a genuine logged-in browser; the only true no-browser path is the codex backend (which bills Codex-usage).
This tool is used mostly through other people's agents, so maintainers rarely see failures directly — the agent running it is the main way bugs surface. When you hit something that looks like a tool bug, help get it reported instead of silently working around it.
web/chrome-use failure that chatgpt-imagegen doctor says should be fine, or behavior that contradicts this SKILL.doctor + the install steps), or a deliberate content refusal by the model.chatgpt-imagegen -V (version),chatgpt-imagegen doctor.