GPT Image 2 API

Data & APIs

Generate and edit images via OpenAI gpt-image-2 model. Agent-agnostic CLI — works with any AI agent (Hermes, Claude Code, Codex, OpenClaw, etc.). Supports configurable base_url/api_key, text-to-image and image-to-image editing.

Install

openclaw skills install gpt-image-2-api

gpt-image-2

Generate and edit images via OpenAI's gpt-image-2 model. Agent-agnostic — designed to work with any AI agent or standalone from the command line.

Quick Start

# 1. Initialize config (one-time)
python3 gpt_image2.py config --init

# 2. Edit the config to set your API key
#    ~/.config/gpt-image-2/config.json

# 3. Generate
python3 gpt_image2.py generate "A cute cat on a windowsill" -o ~/cat.png --quality low

# 4. Edit
python3 gpt_image2.py edit input.png "Change the sofa to green" -o ~/output.png

Configuration

Config priority: --config flag > --base-url/--api-key flags > config file > environment variables > defaults.

Config File Locations (in priority order)

PriorityPathNotes
1$XDG_CONFIG_HOME/gpt-image-2/config.jsonXDG standard (recommended)
2~/.config/gpt-image-2/config.jsonDefault XDG fallback
3~/.gpt-image-2-config.jsonSingle-file fallback
4~/.hermes/gpt-image-2-config.jsonLegacy Hermes compat

Use python3 gpt_image2.py config --show to see which config is active.

Config File Format

{
  "base_url": "https://api.openai.com/v1",
  "api_key_env": "OPENAI_API_KEY"
}
FieldTypeDescription
base_urlstringAPI base URL. Default: https://api.openai.com/v1
api_keystringPlaintext API key (not recommended — visible in file)
api_key_envstringEnvironment variable name holding the key (recommended)

Environment Variables (fallback when no config file)

VariablePurpose
GPT_IMAGE2_API_KEYAPI key
GPT_IMAGE2_BASE_URLAPI base URL

Config Management Commands

# Create template config
python3 gpt_image2.py config --init

# Show active config (keys are masked)
python3 gpt_image2.py config --show

# Overwrite config
python3 gpt_image2.py config --init --force

CLI Reference

generate — Text-to-Image

python3 gpt_image2.py generate "prompt" [options]
OptionDefaultDescription
-o, --output~/gpt-image2-output.pngOutput file path
--qualityautolow (~70s), medium (~120s), high (~276s)
--sizeauto1024x1024, 1536x1024, 1024x1536
--formatpngpng, jpeg, webp
--n1Number of images (1-10)
--timeout600curl timeout in seconds
--configauto-detectExplicit config file path
--base-urlfrom configOverride API base URL
--api-keyfrom configOverride API key (visible in ps!)

edit — Image-to-Image

python3 gpt_image2.py edit <image_path> "edit prompt" [options]
OptionDefaultDescription
--masknonePNG mask (transparent=edit area)
--moderationautolow or auto
(all generate options also apply)

config — Manage Configuration

python3 gpt_image2.py config [--init] [--show] [--force] [--config PATH]

Script Location

The script is at scripts/gpt_image2.py relative to this skill directory.

To find it programmatically from any agent:

# If installed as a Hermes skill:
SCRIPT="$(dirname "$(readlink -f "$0")")/../skills/creative/gpt-image-2/scripts/gpt_image2.py"

# Or copy/symlink it anywhere — it's self-contained with zero dependencies beyond stdlib + curl
cp scripts/gpt_image2.py /usr/local/bin/gpt-image2

The script has zero pip dependencies — only Python 3.8+ stdlib and curl.

API Reference

Generations (Text-to-Image)

ItemValue
EndpointPOST {base_url}/images/generations
AuthAuthorization: Bearer {api_key}
Content-Typeapplication/json

Edits (Image-to-Image)

ItemValue
EndpointPOST {base_url}/images/edits
AuthAuthorization: Bearer {api_key}
Content-Typemultipart/form-data

Parameters

Generations (JSON body):

ParamTypeRequiredDescription
modelstringyesgpt-image-2
promptstringyesText description
nintnoNumber of images (default 1)
sizestringno1024x1024, 1536x1024, 1024x1536
qualitystringnolow, medium, high (default auto)
formatstringnopng, jpg, webp (default png)

Edits (form-data):

ParamTypeRequiredDescription
modelstringyesgpt-image-2
promptstringyesEdit instruction
imagefileyesSource image (PNG, max 4 images)
nintnoNumber of outputs (default 1)
sizestringno1024x1024, 1536x1024, 1024x1536, or auto
qualitystringnolow, medium, high (default auto)

Agent Integration Guide

This skill is designed to be agent-agnostic. Any AI agent can use it by:

  1. Locate the script: Find gpt_image2.py in the skill's scripts/ directory
  2. Call via shell: python3 <path>/gpt_image2.py generate "prompt" -o output.png
  3. Parse stdout: The script prints Saved: <path> (<size> KB) on success

Integration Examples

Hermes / Claude Code / Codex / OpenClaw:

python3 /path/to/gpt-image-2/scripts/gpt_image2.py generate "prompt" -o output.png --quality low

From Python (any agent):

import subprocess, json
result = subprocess.run(
    ["python3", script_path, "generate", prompt, "-o", output_path, "--quality", "low"],
    capture_output=True, text=True, timeout=600
)
# Parse result.stdout for "Saved: <path>"

From Node.js / TypeScript:

const { execSync } = require('child_process');
const output = execSync(`python3 ${scriptPath} generate "${prompt}" -o ${outputPath}`);
// Parse output.toString() for "Saved: ..."

Workflow: Agent Generates Images

  1. Always use the CLI script — handles config resolution, auth security, and response parsing
  2. Use low quality for drafts, high quality for final output
  3. For edits: --size auto preserves original dimensions (recommended)
  4. The script outputs: HTTP status, time elapsed, output file path and size
  5. Parse the output: look for Saved: <path> lines to find generated files

Workflow: Agent Edits Existing Images

  1. Save or locate the source image path
  2. Call gpt_image2.py edit <image_path> "<edit_prompt>" --output <output_path>
  3. Edit endpoint can accept up to 4 images via repeated --image flags
  4. Use --size auto to preserve original dimensions

Important Pitfalls

  1. --api-key flag is visible in shell history and ps aux — prefer config file (api_key_env) or environment variables.
  2. The edits endpoint does NOT support response_format — always returns b64_json regardless.
  3. gpt-image-2 generations may time out on some relay endpoints — use --timeout flag (default 600s).
  4. Prompt with special characters — the script writes prompts to temp files internally, avoiding shell escaping issues. No need to worry about quoting.
  5. Authorization header is never passed via -H — the script uses curl -K temp config file, deleted immediately after use. Keys never appear in ps aux.
  6. Config file permissions — the script warns if config has group/other read permissions. Run chmod 600 <config> to fix.
  7. Zero pip dependencies — the script only requires Python 3.8+ stdlib and curl. No installation step needed.
  8. Chinese text in prompts may not render correctly — gpt-image-2's Chinese rendering is unstable; it often ignores Chinese constraints and outputs English text in images. Consider using Gemini for Chinese text rendering.