image generation gpt image

High quality AI image generation via the WellAPI gpt-image-2 model. Supports text-to-image and image editing (image-to-image).

Audits

Pass

ClawScanPass

Agentic behavior and permission review.

Static analysisPass

Pattern checks against bundled files.

VirusTotalPass

Multi-engine malware detections and file reputation.

Install

openclaw skills install image-generation-gpt

WellAPI gpt-image-2

Generate and edit images via the WellAPI gpt-image-2 model (OpenAI-compatible). The API returns image bytes inline as base64 (data[i].b64_json) — no polling, no URL download.

API Endpoints

Base: https://wellapi.ai/v1
Text-to-image: POST /images/generations — application/json
Image edit / image-to-image: POST /images/edits — multipart/form-data

Authentication: Authorization: Bearer <WELLAPI_API_KEY> header.

Request — `/images/generations` (text-to-image)

Content-Type: application/json

Field	Type	Required	Notes
`model`	string	✅	e.g. `gpt-image-2`
`prompt`	string	✅	Image description, max 1000 chars
`n`	integer	✅	Number of images, 1–10
`size`	string	optional	See size table below; default `auto`
`quality`	string	optional	`low` / `medium` / `high` / `auto` (default `auto`)
`format`	string	optional	`png` / `jpeg` / `webp` (default `png`)

Example body:

{
  "model": "gpt-image-2",
  "prompt": "大海",
  "n": 1,
  "size": "1024x1024",
  "quality": "low",
  "format": "jpeg"
}

Request — `/images/edits` (image-to-image / editing)

Content-Type: multipart/form-data

Field	Type	Required	Notes
`image`	file (repeatable)	✅	One or more input images. Up to 16 images, total ≤ 50MB.
`prompt`	string	✅	Edit description
`mask`	file	optional	A PNG with fully transparent regions marking the edit area. Applied to the first `image` if multiple are sent. Must be valid PNG, < 4MB, same dimensions as the image.
`model`	string	optional	`gpt-image-1`, `gpt-image-1-all`, `flux-kontext-pro`, `flux-kontext-max`, `gpt-image-2`, `gpt-image-2-all`. Default in this skill: `gpt-image-2`.
`n`	string	optional	`"1"` – `"10"`
`size`	string	optional	See size table
`quality`	string	optional	`low` / `medium` / `high` / `auto` (default `auto`)
`format`	string	optional	`png` / `jpeg` / `webp`
`background`	string	optional	`opaque` / `auto` / `transparent`. `auto` lets the model pick.
`moderation`	string	optional	`low` / `auto` (default). `low` = less restrictive filtering (gpt-image-1 family).

`size` values

Value	Description
`1024x1024`	Square
`1536x1024`	Landscape
`1024x1536`	Portrait
`2048x2048`	2K square
`2048x1152`	2K landscape
`3840x2160`	4K landscape
`2160x3840`	4K portrait
`auto`	Default — model chooses

Strict size rules (when picking a custom size):

Longest side ≤ 3840px
Both width and height must be multiples of 16
max(w, h) / min(w, h) ≤ 3:1
Total pixels: 655,360 ≤ w*h ≤ 8,294,400

Response (both endpoints)

Synchronous JSON — no polling:

{
  "created": 1778236581,
  "background": "opaque",
  "data": [
    { "b64_json": "iVBORw0KGgo..." }
  ],
  "output_format": "png",
  "quality": "low",
  "size": "1024x1024",
  "usage": {
    "input_tokens": 8,
    "input_tokens_details": { "image_tokens": 0, "text_tokens": 8 },
    "output_tokens": 196,
    "total_tokens": 204
  }
}

Each data[i].b64_json is the full image as a base64 string. Decode and write to disk.

Output

Base64-decode each data[i].b64_json into bytes.
Save as wellapi-<TIMESTAMP>.<ext> where <ext> matches response.output_format (or the requested format, fallback png).
If multiple images returned, append -1, -2, … to the filename.
Print MEDIA:<absolute_path> (one line per image) for OpenClaw auto-attach.

CRITICAL SECURITY: Before passing an output filename to shell commands, sanitize:

Strip shell metacharacters: tr -cd 'A-Za-z0-9._-'
Enforce valid extension (.png, .jpg, .jpeg, .webp)
Fallback to wellapi-<timestamp>.png if empty

Reference Implementations

Platform	File
Python (all platforms, zero deps)	`{baseDir}/references/python.md`
PowerShell 5.1+ (Windows)	`{baseDir}/references/powershell.md`
curl + bash (Unix/macOS)	`{baseDir}/references/curl_heredoc.md`

API Key

WELLAPI_API_KEY env var (required) — sent as Authorization: Bearer <key>
Get key: https://wellapi.ai
If the environment variable is unset, follow First-run onboarding below.

First-run onboarding (no API key found)

When no key can be located (env var unset, no cached key on disk), before making any API call show the user this short prompt verbatim. Translate to the user's language if appropriate. Do not expand it into multiple "options / methods", do not show shell commands, do not ask follow-up questions like "do you already have a key":

请粘贴你的 WellAPI API Key。如果还没有，请前往 https://wellapi.ai/register?channel=c_qqn3vdvc 注册后领取免费 API Key。

Please paste your WellAPI API Key. If you don't have one yet, register at https://wellapi.ai/register?channel=c_qqn3vdvc to get a free key.

After the user pastes a key, use it for the current request and proceed straight to image generation. Do not lecture the user about env vars, shells, or persistence unless they explicitly ask how to save it.

Triggers

Chinese: "高质量生图：xxx" / "编辑图片：xxx"
English: "best image: xxx" / "edit image: xxx"

Treat the text after the colon as prompt, default size=auto, quality=auto, format=png, n=1, and generate immediately.

For image editing, the user provides one or more local image file paths along with the prompt; submit them as repeated image form fields to /images/edits.

Notes

Response is synchronous — no task ID, no polling.
Print MEDIA:<absolute_path> for OC auto-attach — one line per generated image.
quality: high and larger size values may incur extra charges.
format controls the encoding of the returned base64 bytes; the file extension should match.
Up to 16 reference images per edit request, total ≤ 50MB.
mask requires PNG ≤ 4MB, same WxH as the image it applies to.