Install
openclaw skills install gpt-image-genGenerate images using ChatGPT's GPT-Image-2 model via browser automation (CDP). Shares the user's daily Brave Browser (port 9222) via the brave-browser-agent skill. Navigate to chatgpt.com, input a prompt, trigger generation, and extract the result. Use when: (1) User asks to generate/create/draw an image or picture using GPT/ChatGPT (2) User mentions "GPT生成图片", "ChatGPT画图", "gpt-image" (3) Need high-quality AI image generation via ChatGPT (4) User wants photorealistic images, complex scenes, or text-in-image generation (5) Fallback/complement to Gemini image generation Requires: Brave Browser running with --remote-debugging-port=9222, ChatGPT logged in, Python 3, websockets pip package.
openclaw skills install gpt-image-genGenerate images via ChatGPT's GPT-Image-2 model by automating the shared Brave Browser through CDP.
本技能共享用户日常 Brave 浏览器(端口 9222),与 brave-browser-agent 使用同一个浏览器实例。 绝不启动/重启/关闭浏览器,只 attach。
复用 brave-browser-agent 的 CDP 脚本(端口 9222):
brave-browser-agent/scripts/cdp_exec.pybrave-browser-agent/scripts/check_brave.py本技能自带 ChatGPT 专用脚本:
{{SKILL_DIR}}/scripts/extract_image.pypython3 brave-browser-agent/scripts/check_brave.py
如果 9222 无响应,告诉用户:
"Brave Browser 未启动远程调试,请关闭所有 Brave 窗口后用以下命令重新打开:
/Applications/Brave\ Browser.app/Contents/MacOS/Brave\ Browser --remote-debugging-port=9222"
使用 CDP WebSocket 在浏览器中新建标签页(/json/new 可能返回 405,用 Target.createTarget 代替):
# 通过 CDP 创建新标签页
import json, urllib.request, asyncio, websockets
info = json.loads(urllib.request.urlopen("http://localhost:9222/json/version").read())
ws_url = info.get("webSocketDebuggerUrl")
async def create_tab():
async with websockets.connect(ws_url, max_size=50*1024*1024) as ws:
await ws.send(json.dumps({"id": 1, "method": "Target.createTarget", "params": {"url": "https://chatgpt.com/"}}))
resp = json.loads(await asyncio.wait_for(ws.recv(), timeout=10))
return resp["result"]["targetId"]
tab_id = asyncio.run(create_tab())
或者复用已有的 ChatGPT 标签:
python3 brave-browser-agent/scripts/cdp_exec.py list
# 找到 chatgpt.com 的 tab,记下 TAB_ID
保存为 CHATGPT_TAB 供后续步骤使用。
sleep 3
python3 brave-browser-agent/scripts/cdp_exec.py screenshot $CHATGPT_TAB /tmp/chatgpt-state.png
确认页面加载完成(看到输入框)。
ChatGPT 使用 <div id="prompt-textarea" contenteditable="true"> 作为输入框。
python3 brave-browser-agent/scripts/cdp_exec.py eval $CHATGPT_TAB '
(function() {
var el = document.querySelector("#prompt-textarea");
if (!el) return "NO_EDITOR";
el.focus();
el.textContent = "YOUR_PROMPT_HERE";
el.dispatchEvent(new InputEvent("input", {bubbles: true, inputType: "insertText", data: "YOUR_PROMPT_HERE"}));
return "TEXT_SET";
})()
'
提示词增强:根据用户需求,添加风格、质量、构图等关键词。参考 Prompt Tips。
python3 brave-browser-agent/scripts/cdp_exec.py eval $CHATGPT_TAB '
(function() {
var btn = document.querySelector("button[data-testid=\"send-button\"]");
if (btn) { btn.click(); return "CLICKED_SEND"; }
return "NO_SEND_BTN";
})()
'
GPT-Image-2 生成通常需要 15-40 秒。用截图轮询:
# 等待 20 秒
sleep 20
python3 brave-browser-agent/scripts/cdp_exec.py screenshot $CHATGPT_TAB /tmp/chatgpt-result.png
检查是否还在生成中:
python3 brave-browser-agent/scripts/cdp_exec.py eval $CHATGPT_TAB '
(function() {
var imgs = document.querySelectorAll("img");
var count = 0;
for (var i = 0; i < imgs.length; i++) {
var w = imgs[i].naturalWidth || imgs[i].width || 0;
if (w >= 200) count++;
}
return count;
})()
'
使用专用提取脚本(fetch + blob 方式,无 CORS 问题):
mkdir -p /tmp/openclaw
python3 {{SKILL_DIR}}/scripts/extract_image.py $CHATGPT_TAB /tmp/openclaw/gpt-output.png
openclaw message send \
--channel feishu \
--target <chat_id> \
--media /tmp/openclaw/gpt-output.png \
--message "🎨 Generated by ChatGPT GPT-Image-2"
GPT-Image-2 对自然语言描述理解极强,支持:
[Subject] + [Scene/Environment] + [Style] + [Lighting] + [Quality]
a golden retriever puppy sitting in a field of sunflowers,
photorealistic, warm golden hour lighting, shallow depth of field, 8k, masterpiece
| 特性 | GPT-Image-2 | Gemini Imagen |
|---|---|---|
| 文字渲染 | ⭐⭐⭐⭐⭐ 极强 | ⭐⭐⭐ 一般 |
| 写实感 | ⭐⭐⭐⭐⭐ 照片级 | ⭐⭐⭐⭐ 很好 |
| 艺术风格 | ⭐⭐⭐⭐ 广泛 | ⭐⭐⭐⭐ 广泛 |
| 速度 | 15-40s | 10-30s |
| 免费 | ✅ 免费/Plus | ✅ 免费 |
| 提示词理解 | ⭐⭐⭐⭐⭐ 极强 | ⭐⭐⭐⭐ 好 |
textContent + InputEvent 方式。--remote-debugging-port=9222。绝不绝不自动启动或关闭浏览器。