Gemini Image Gen + Watermark Removal

Google Gemini 网页端生图并去水印。通过 OpenClaw Browser Tool 控制浏览器生成、下载图片,再用 GeminiWatermarkTool 去除水印。使用场景:谷歌生图/Gemini 生图/Google Gemini 图片/去水印/浮水印/Gemini watermark removal。

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 58 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (Gemini image generation + watermark removal) matches the SKILL.md: it uses the OpenClaw Browser Tool to drive a logged-in Google browser and then calls an external GeminiWatermarkTool binary. The skill does not request unrelated environment variables or binaries.
Instruction Scope
Instructions are explicit about browser actions, shell commands (sleep, ls), and calling a local binary. They require operating on an already logged-in browser profile (profile="user"), and optionally reference a local send-feishu-image skill to transmit images. These are within the declared purpose but mean the agent will act with the privileges of the logged-in browser and can send generated images externally if the optional step is used.
Install Mechanism
This is an instruction-only skill (no install spec). The SKILL.md recommends installing GeminiWatermarkTool via brew or GitHub Releases. Both are reasonable but downloading binaries from releases has inherent supply-chain risk — the skill itself does not perform the download.
Credentials
The skill declares no env vars or credentials, which is proportional. However it expects access to a logged-in Google browser profile and local filesystem paths (~/Downloads, ~/.claude/skills/gwt, ~/.openclaw/workspace/skills/send-feishu-image). Those accesses are consistent with the stated tasks but are sensitive (can access user account state and local files).
Persistence & Privilege
always is false, no install step writes to disk as part of the skill bundle, and the skill does not request elevated or persistent platform privileges. Autonomous invocation is allowed but is the platform default and not itself suspicious here.
Scan Findings in Context
[none] expected: No regex-based findings were detected; the package is instruction-only so there were no code files for static analysis.
Assessment
This skill appears to do what it says, but consider these points before installing/using it: 1) The skill drives a logged-in browser profile — it will act with your Google session's privileges. Only use it with accounts you trust for automation. 2) Removing watermarks may violate service terms or copyright; ensure you have the right to alter the images. 3) GeminiWatermarkTool is recommended to be installed via brew or direct GitHub release; verify the source and signatures before running downloaded binaries. 4) The optional send-feishu-image step will transmit images to an external service (Feishu); double-check recipients and that you trust the target workspace. 5) As a precaution, test in a throwaway/isolated account or environment first, and inspect any binaries you install. If you need a higher-assurance review, ask for the exact GeminiWatermarkTool release links and checksums and verify the brew tap and GitHub repo reputations.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.1.0
Download zip
latestvk97c7dgqccca38ck6gsrrdr3p183rcfx

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Google Gemini 生图

通过 OpenClaw Browser Tool 操控已登录 Google 账号的浏览器,在 Gemini 网页端生成并下载图片。

前置条件

  • 浏览器已登录 Google 账号
  • OpenClaw Browser Tool 可用(确保 openclaw browser status 正常)
  • profile 使用 user 连接已打开的 Chrome

执行流程

1. 打开 Gemini 页面

browser(action="open", profile="user", url="https://gemini.google.com")

也可以直接打开已有对话链接复用图片:

browser(action="open", profile="user", url="https://gemini.google.com/app/<对话ID>")

2. 点击「制作图片」

snapshot 找到按钮 ref,然后 click:

browser(action="snapshot", profile="user", compact=true)
// 找到「制作图片」按钮的 ref,然后 click
browser(action="act", profile="user", request={"kind": "click", "ref": "<ref>"})

新对话会先展示风格选择界面(单色/色块/跑跑等),可以直接忽略,在输入框输入 prompt 即可。

3. 输入 Prompt 并发送

browser(action="act", profile="user", request={"kind": "type", "ref": "<textarea ref>", "text": "你的Prompt"})
browser(action="act", profile="user", request={"kind": "press", "key": "Enter"})

⚠️ Prompt 规则

  • 避免使用"唱""弹奏"等动词关键词,否则 Gemini 会误触发音乐生成而非图片生成
  • 改为纯视觉描述,如"wearing a microphone headset"而非"singing with a microphone"
  • 需要文字时直接在 prompt 中写明,如 The text "畢士傾訴" appears on a banner

4. 等待图片生成

⚠️ 关键:不要用 act(kind="wait")

act(kind="wait") 在 CDP 层面没有真正的"等待页面变化"机制,它只是在等 WebSocket 响应,8 秒无响应就会超时并导致整个 browser tool session 不可用。

正确做法:用 exec sleep 等待后再 snapshot

exec: sleep 20 && echo "done"
// 等待 exec 完成后
browser(action="snapshot", profile="user", compact=true)

生成完成标志:页面出现「下载完整尺寸的图片」「复制图片」「分享图片」等按钮。

如果 snapshot 显示还在生成中(有 "Creating your image..." 按钮),再 sleep 一轮。

5. 下载图片

点击「下载完整尺寸的图片」按钮:

browser(action="act", profile="user", request={"kind": "click", "ref": "<下载按钮ref>"})

等待下载完成后检查下载目录:

sleep 5 && ls -lt ~/Downloads/Gemini_Generated_Image* | head -3

6. 去水印

Gemini 生成的图片带有水印,使用 GeminiWatermarkTool 去除。

安装(macOS / Linux):

brew install allenk/tap/gwt

或从 GitHub Releases 下载二进制文件。

已知可用路径(若 brew 不可用):

~/.claude/skills/gwt/bin/GeminiWatermarkTool

使用

gwt --force -i <输入图片> -o <输出图片>

7. 发送到飞书(可选)

使用 send-feishu-image 技能:

import sys
sys.path.insert(0, "~/.openclaw/workspace/skills/send-feishu-image")
from send_feishu_image import send_image
result = send_image(
    image_path="/path/to/output.png",
    user_id="ou_7abe0c2af8a0f7b5b1c1171bcd8707d8",
    caption="图片说明"
)

已知问题

问题解决方案
act(kind="wait") 超时导致 browser tool 不可用永远不要用 act(kind="wait"),改用 exec sleep + snapshot 轮询
snapshot 超时重启 Gateway(菜单栏 OpenClaw → Restart)
标签页未找到browser(action="snapshot") 查看当前页面状态
触发了音乐生成prompt 去掉"唱""弹"等词,改为纯视觉描述
图片长时间未生成Gemini 模型较慢,sleep 20-25 秒再 snapshot
gwt 安装失败(GitHub 不可达)检查 ~/.claude/skills/gwt/bin/GeminiWatermarkTool 是否已存在
下载后找不到新文件注意文件名变化,用 ls -lt 按时间排序查看最新的

完成后

  • 关闭不用的标签页:browser(action="close", targetId="<ID>")

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…