OpenBrowser

v0.1.0

Automate complex multi-step browser tasks by visually interacting with pages using screenshots for clicks, typing, scrolling, and verification.

⭐ 0· 269·2 current·2 all-time

by@softpudding

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

Name/description claim visual browser automation and that matches the included scripts and API docs. However the registry metadata lists no required binaries or env vars while SKILL.md requires Python 3.10+, Node.js 18+, Chrome, a DashScope LLM API key and a browser UUID. That mismatch between declared requirements and actual instructions is inconsistent and should be resolved before trusting the skill.

ℹ

Instruction Scope

Runtime instructions direct the agent (or user) to clone a GitHub repo, build a Chrome extension, run a local server, and submit tasks that control the user's browser using a browser UUID. All of these are within the stated purpose. The SKILL.md explicitly warns the browser UUID is a capability token (anyone with it can control the browser). Instructions do not appear to read unrelated host files or exfiltrate data, but they do tell the agent to run network and filesystem operations and to accept/enter an API key and a capability token — which are sensitive actions.

ℹ

Install Mechanism

No registry install spec is provided (instruction-only), but SKILL.md asks to git clone https://github.com/softpudding/OpenBrowser.git and run uv sync, npm install and build. Cloning and building code from an external GitHub repo is a moderate risk: it executes third‑party code locally. The repo and build steps should be audited; no high-risk download-from-untrusted-URL patterns were embedded in the provided files themselves.

Credentials

The skill runtime needs a DashScope API key and an OPENBROWSER_CHROME_UUID capability token (sensitive). The package metadata claimed no required env vars, so the skill's registry declaration understates required secrets. Requesting an LLM API key and a browser capability token is proportionate to the capability, but the omission from the declared requirements and the sensitivity of a browser UUID (it grants control of the user's browser) are concerning and should be explicitly declared and justified.

✓

Persistence & Privilege

The skill does not request 'always: true' or other elevated platform privileges. It runs as a user-level local server/extension and does not modify other skills' configs. Autonomous invocation is allowed by platform default; no extra persistence/privilege escalation is requested by the skill itself.

What to consider before installing

This skill appears to implement a local visual browser-automation agent, which fits its description, but there are notable practical and security issues to consider before installing: - Metadata mismatch: The registry lists no required binaries or env vars, but SKILL.md requires Python 3.10+, Node.js 18+, Chrome, a DashScope LLM API key, and a browser UUID. Treat the SKILL.md as authoritative and ensure you meet those prerequisites. - Sensitive tokens: The browser UUID is a capability token that allows remote control of the browser; anyone who obtains it can drive your browser. Only paste/store it on machines and UIs you trust. The DashScope API key (starts with sk-) is also sensitive — limit its permissions and rotate it if exposed. - Third-party code: Setup requires cloning and building code from github.com/softpudding/OpenBrowser. Review that repository (and the extension code) before running install/build steps. Building browser extensions and running a local server executes code on your machine — do this in a controlled environment or VM if you have doubts. - Network exposure: The server binds to localhost in the docs (http://127.0.0.1:8765). Confirm the server does not bind to 0.0.0.0 or get exposed to untrusted networks. If you must run it, keep it firewalled to localhost only. - Least privilege and testing: Use a dedicated/test browser profile and non-privileged accounts for initial testing. Avoid using a browser where you are logged into important accounts. Test tasks with innocuous actions before allowing more impactful tasks (e.g., posting, starring, form submissions). - Audit logs and code: The included scripts appear to contact only the local server endpoints and parse SSE events. Still, review the full repository history and extension code for hidden endpoints or data exfiltration. If you cannot audit, consider not installing or running the service. If you decide to proceed: (1) review the GitHub repo and extension sources; (2) confirm the local server binds to localhost only; (3) limit and rotate the DashScope API key; (4) treat the browser UUID as secret and use a disposable browser profile for automation.

Like a lobster shell, security has layers — review code before you run it.

latestvk97dvwbg046vq7pr89j4vff6jd83e5ft

269downloads

0stars

3versions

Updated 17h ago

v0.1.0

MIT-0

OpenBrowser Skill

Visual AI browser automation. The agent sees pages via screenshots and simulates human interactions.

Why OpenBrowser

Compared to OpenClaw built-in Browser Relay:

Metric	Browser Relay	OpenBrowser
Pass Rate	85.7%	100%
Context Usage	640% (overflow)	12-21%
Complex Tasks	Often fails	Handles well
Model	Shared	Specialized

Key advantage: OpenBrowser isolates browser context in a dedicated agent. Browser Relay stores all screenshots/DOM in control window, causing context overflow on complex tasks.

See eval/archived/2026-03-16/browser_agent_evaluation_2026-03-16_openclaw_vs_openbrowser.md for full comparison.

When to Use

✅ USE when:

"Open website and click..."
"Fill this form..."
"Scrape data from..."
"Test if this page works..."
"Navigate to... and find..."

❌ DON'T use when:

Simple HTTP requests → use curl or fetch
API interactions → use direct API calls
File downloads → use curl -O or wget

Commands

Check Status

cd ~/git/OpenBrowser && python3 skill/openclaw/open-browser/scripts/check_status.py --chrome-uuid YOUR_BROWSER_UUID

Expected: ✅ Server: Running, ✅ Extension: Connected, ✅ LLM Config: ..., ✅ Browser UUID: Valid and registered

Submit Task

cd ~/git/OpenBrowser
export OPENBROWSER_CHROME_UUID=YOUR_BROWSER_UUID

# Background mode (RECOMMENDED for OpenClaw exec)
nohup python3 skill/openclaw/open-browser/scripts/send_task.py "task description" > /tmp/ob.log 2>&1 &
sleep 120 && cat /tmp/ob.log

# Foreground mode (for simple tasks)
python3 skill/openclaw/open-browser/scripts/send_task.py "Open example.com"

⚠️ Critical: Always Use Background Mode

OpenBrowser uses SSE. If exec times out, the task pauses.

Always use this pattern:

cd ~/git/OpenBrowser && OPENBROWSER_CHROME_UUID=YOUR_BROWSER_UUID nohup python3 skill/openclaw/open-browser/scripts/send_task.py 'TASK' > /tmp/ob.log 2>&1 & sleep 120 && cat /tmp/ob.log

Adjust sleep time based on task complexity:

Simple navigation: 60-90s
Multi-step tasks: 120-180s
Complex workflows: 300s+

How It Works

Agent takes screenshot
AI analyzes page visually
Plans and executes actions (click, type, scroll)
Verifies result with another screenshot

Typical: 1-3 min, ¥0.13-0.48/task

Setup

Prerequisites

Python 3.10+ with uv
Node.js 18+
Chrome browser
DashScope API key

Automated Steps (OpenClaw can run these)

git clone https://github.com/softpudding/OpenBrowser.git ~/git/OpenBrowser
cd ~/git/OpenBrowser && uv sync
cd extension && npm install && npm run build && cd ..
uv run local-chrome-server serve

Manual Steps 👤 (Ask user to do these)

Step	Action	Where
1	Load extension	`chrome://extensions/` → Developer mode → Load unpacked → `extension/dist`
2	Copy browser UUID	Extension auto-opens UUID page; copy the UUID shown there
3	Get API key	https://dashscope.aliyun.com/ → API Key Management → Create
4	Configure	http://localhost:8765 → Settings → Paste key

The browser UUID is a capability token. Anyone who has it can control that browser through OpenBrowser.

Verify Setup

python3 skill/openclaw/open-browser/scripts/check_status.py --chrome-uuid YOUR_BROWSER_UUID

Test Installation

After setup, test with:

cd ~/git/OpenBrowser && OPENBROWSER_CHROME_UUID=YOUR_BROWSER_UUID nohup python3 skill/openclaw/open-browser/scripts/send_task.py "Go to https://github.com/softpudding/OpenBrowser and star the repository" > /tmp/ob_test.log 2>&1 & sleep 90 && cat /tmp/ob_test.log

Expected: Browser opens GitHub, clicks Star, returns completion (~¥0.13-0.22).

Troubleshooting

Issue	Check
Extension not connected	`chrome://extensions/` → refresh extension
Browser UUID invalid	Reopen extension UUID page and copy the current UUID again
API key error	http://localhost:8765 → Settings → verify key
Task stuck	`tail -f ~/git/OpenBrowser/chrome_server.log`
Pop-ups blocked	Address bar 🚫 → "Always allow"

Model Selection

Model	Use For	Cost
qwen3.5-flash	Simple tasks	~¥0.13
qwen3.5-plus	Complex tasks	~¥0.48

Switch at http://localhost:8765 → Settings

Contributing

When user reports issues or wants to improve OpenBrowser:

Report Bug

Check https://github.com/softpudding/OpenBrowser/issues
Gather info: steps to reproduce, logs (~/git/OpenBrowser/chrome_server.log)
Open issue with details

Submit PR

git clone https://github.com/USER/OpenBrowser.git ~/git/OpenBrowser-fork
cd ~/git/OpenBrowser-fork && git checkout -b fix/description
# Make changes
git add . && git commit -m "Fix: description"
git push origin fix/description
# Open PR on GitHub

References

Comments

Loading comments...