GPT Image 2 API
v2.0.0Generate and edit images via OpenAI gpt-image-2 model. Agent-agnostic CLI — works with any AI agent (Hermes, Claude Code, Codex, OpenClaw, etc.). Supports co...
Like a lobster shell, security has layers — review code before you run it.
gpt-image-2
Generate and edit images via OpenAI's gpt-image-2 model. Agent-agnostic — designed to work with any AI agent or standalone from the command line.
Quick Start
# 1. Initialize config (one-time)
python3 gpt_image2.py config --init
# 2. Edit the config to set your API key
# ~/.config/gpt-image-2/config.json
# 3. Generate
python3 gpt_image2.py generate "A cute cat on a windowsill" -o ~/cat.png --quality low
# 4. Edit
python3 gpt_image2.py edit input.png "Change the sofa to green" -o ~/output.png
Configuration
Config priority: --config flag > --base-url/--api-key flags > config file > environment variables > defaults.
Config File Locations (in priority order)
| Priority | Path | Notes |
|---|---|---|
| 1 | $XDG_CONFIG_HOME/gpt-image-2/config.json | XDG standard (recommended) |
| 2 | ~/.config/gpt-image-2/config.json | Default XDG fallback |
| 3 | ~/.gpt-image-2-config.json | Single-file fallback |
| 4 | ~/.hermes/gpt-image-2-config.json | Legacy Hermes compat |
Use python3 gpt_image2.py config --show to see which config is active.
Config File Format
{
"base_url": "https://api.openai.com/v1",
"api_key_env": "OPENAI_API_KEY"
}
| Field | Type | Description |
|---|---|---|
base_url | string | API base URL. Default: https://api.openai.com/v1 |
api_key | string | Plaintext API key (not recommended — visible in file) |
api_key_env | string | Environment variable name holding the key (recommended) |
Environment Variables (fallback when no config file)
| Variable | Purpose |
|---|---|
GPT_IMAGE2_API_KEY | API key |
GPT_IMAGE2_BASE_URL | API base URL |
Config Management Commands
# Create template config
python3 gpt_image2.py config --init
# Show active config (keys are masked)
python3 gpt_image2.py config --show
# Overwrite config
python3 gpt_image2.py config --init --force
CLI Reference
generate — Text-to-Image
python3 gpt_image2.py generate "prompt" [options]
| Option | Default | Description |
|---|---|---|
-o, --output | ~/gpt-image2-output.png | Output file path |
--quality | auto | low (~70s), medium (~120s), high (~276s) |
--size | auto | 1024x1024, 1536x1024, 1024x1536 |
--format | png | png, jpeg, webp |
--n | 1 | Number of images (1-10) |
--timeout | 600 | curl timeout in seconds |
--config | auto-detect | Explicit config file path |
--base-url | from config | Override API base URL |
--api-key | from config | Override API key (visible in ps!) |
edit — Image-to-Image
python3 gpt_image2.py edit <image_path> "edit prompt" [options]
| Option | Default | Description |
|---|---|---|
--mask | none | PNG mask (transparent=edit area) |
--moderation | auto | low or auto |
| (all generate options also apply) |
config — Manage Configuration
python3 gpt_image2.py config [--init] [--show] [--force] [--config PATH]
Script Location
The script is at scripts/gpt_image2.py relative to this skill directory.
To find it programmatically from any agent:
# If installed as a Hermes skill:
SCRIPT="$(dirname "$(readlink -f "$0")")/../skills/creative/gpt-image-2/scripts/gpt_image2.py"
# Or copy/symlink it anywhere — it's self-contained with zero dependencies beyond stdlib + curl
cp scripts/gpt_image2.py /usr/local/bin/gpt-image2
The script has zero pip dependencies — only Python 3.8+ stdlib and curl.
API Reference
Generations (Text-to-Image)
| Item | Value |
|---|---|
| Endpoint | POST {base_url}/images/generations |
| Auth | Authorization: Bearer {api_key} |
| Content-Type | application/json |
Edits (Image-to-Image)
| Item | Value |
|---|---|
| Endpoint | POST {base_url}/images/edits |
| Auth | Authorization: Bearer {api_key} |
| Content-Type | multipart/form-data |
Parameters
Generations (JSON body):
| Param | Type | Required | Description |
|---|---|---|---|
model | string | yes | gpt-image-2 |
prompt | string | yes | Text description |
n | int | no | Number of images (default 1) |
size | string | no | 1024x1024, 1536x1024, 1024x1536 |
quality | string | no | low, medium, high (default auto) |
format | string | no | png, jpg, webp (default png) |
Edits (form-data):
| Param | Type | Required | Description |
|---|---|---|---|
model | string | yes | gpt-image-2 |
prompt | string | yes | Edit instruction |
image | file | yes | Source image (PNG, max 4 images) |
n | int | no | Number of outputs (default 1) |
size | string | no | 1024x1024, 1536x1024, 1024x1536, or auto |
quality | string | no | low, medium, high (default auto) |
Agent Integration Guide
This skill is designed to be agent-agnostic. Any AI agent can use it by:
- Locate the script: Find
gpt_image2.pyin the skill'sscripts/directory - Call via shell:
python3 <path>/gpt_image2.py generate "prompt" -o output.png - Parse stdout: The script prints
Saved: <path> (<size> KB)on success
Integration Examples
Hermes / Claude Code / Codex / OpenClaw:
python3 /path/to/gpt-image-2/scripts/gpt_image2.py generate "prompt" -o output.png --quality low
From Python (any agent):
import subprocess, json
result = subprocess.run(
["python3", script_path, "generate", prompt, "-o", output_path, "--quality", "low"],
capture_output=True, text=True, timeout=600
)
# Parse result.stdout for "Saved: <path>"
From Node.js / TypeScript:
const { execSync } = require('child_process');
const output = execSync(`python3 ${scriptPath} generate "${prompt}" -o ${outputPath}`);
// Parse output.toString() for "Saved: ..."
Workflow: Agent Generates Images
- Always use the CLI script — handles config resolution, auth security, and response parsing
- Use low quality for drafts, high quality for final output
- For edits:
--size autopreserves original dimensions (recommended) - The script outputs: HTTP status, time elapsed, output file path and size
- Parse the output: look for
Saved: <path>lines to find generated files
Workflow: Agent Edits Existing Images
- Save or locate the source image path
- Call
gpt_image2.py edit <image_path> "<edit_prompt>" --output <output_path> - Edit endpoint can accept up to 4 images via repeated
--imageflags - Use
--size autoto preserve original dimensions
Important Pitfalls
--api-keyflag is visible in shell history andps aux— prefer config file (api_key_env) or environment variables.- The edits endpoint does NOT support
response_format— always returns b64_json regardless. - gpt-image-2 generations may time out on some relay endpoints — use
--timeoutflag (default 600s). - Prompt with special characters — the script writes prompts to temp files internally, avoiding shell escaping issues. No need to worry about quoting.
- Authorization header is never passed via
-H— the script uses curl-Ktemp config file, deleted immediately after use. Keys never appear inps aux. - Config file permissions — the script warns if config has group/other read permissions. Run
chmod 600 <config>to fix. - Zero pip dependencies — the script only requires Python 3.8+ stdlib and
curl. No installation step needed. - Chinese text in prompts may not render correctly — gpt-image-2's Chinese rendering is unstable; it often ignores Chinese constraints and outputs English text in images. Consider using Gemini for Chinese text rendering.
Comments
Loading comments...
