Agent Browser
Headless browser automation CLI for AI agents. Use when interacting with websites — navigating pages, filling forms, clicking buttons, taking screenshots, ex...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 778 · 1 current installs · 1 all-time installs
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The declared purpose — headless browser automation — matches the commands and features described (navigation, click/fill, screenshots, downloads, state persistence). However SKILL.md and setup.sh state that Node.js/npm are required while the registry metadata lists no required binaries or env vars, which is an inconsistency the user should be aware of.
Instruction Scope
Runtime instructions include powerful operations: eval'ing arbitrary page JS, connecting to an existing Chrome/CDP, opening file:// URIs, uploading and downloading files, saving/loading session state and auth profiles, and redirecting page text to files. These are expected for a browser-automation tool but also enable access to local files and any data visible in pages the agent visits. The SKILL.md examples reference $USERNAME/$PASSWORD but the skill declares no required env vars.
Install Mechanism
There is no registry install spec; instead scripts/setup.sh performs npm install -g agent-browser and then runs agent-browser install to fetch Chromium. Installing from npm at runtime is normal for Node tools but it means arbitrary package code will be downloaded and executed on the host. The package's source is 'unknown' (no homepage) so the provenance of the npm package is not verified in the metadata.
Credentials
The skill declares no required credentials or env vars, yet SKILL.md references optional envs (AGENT_BROWSER_*), and examples use $USERNAME/$PASSWORD and saving auth profiles and state files. The tool can persist credentials and cookies to disk and may read local files (file://, downloads). Requesting no secrets in metadata but instructing workflows that use and store secrets is a mismatch and elevates risk.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges, but it provides commands to save session state, store auth profiles, and write files to disk. Those behaviors are expected for a browser automation CLI but mean the skill can create persistent artifacts containing sensitive data on the host.
What to consider before installing
This skill appears to implement a real browser-automation CLI, but there are several things to check before installing or running it:
- Provenance: There is no homepage and the registry metadata gives an unknown source. Inspect the npm package 'agent-browser' (version author, README, and published files) before running npm install -g. Prefer a package with a verifiable repository and maintainer.
- Runtime requirements mismatch: SKILL.md requires Node.js/npm but the registry metadata lists no required binaries—make sure your environment meets the tool's needs and be cautious when running setup scripts.
- Data exposure: The tool can access file:// URLs, download/upload files, eval JS in page contexts, and save auth/state to disk. Do not provide sensitive credentials (passwords, API keys, private tokens) to the agent unless you trust the package and have verified where/how those secrets are stored. Consider using dedicated test accounts.
- Install safely: Run the setup in an isolated environment (container, VM) so the npm package and its install scripts cannot access your primary host. After installation, inspect installed files under the global npm directory.
- Hardening: If you proceed, set AGENT_BROWSER_ALLOWED_DOMAINS to a strict allowlist, enable AGENT_BROWSER_CONTENT_BOUNDARIES, and limit AGENT_BROWSER_MAX_OUTPUT. Review and control any saved state/auth files (their formats and locations).
- If you are unsure: Ask the publisher for a repository link and package provenance, or prefer a well-known, audited browser-automation tool (e.g., Playwright/Puppeteer) with clear origin and reproducible install steps.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Browser Automation with agent-browser
Setup
Run scripts/setup.sh to install agent-browser and Chromium. Requires Node.js.
Core Workflow
Every browser automation follows this pattern:
- Navigate:
agent-browser open <url> - Snapshot:
agent-browser snapshot -i(get element refs like@e1,@e2) - Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
Command Chaining
Chain with && when you don't need intermediate output:
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
Run separately when you need to parse output first (e.g., snapshot to discover refs).
Essential Commands
# Navigate
agent-browser open <url>
agent-browser close
# See the page (always do this first)
agent-browser snapshot -i # Interactive elements with refs
agent-browser snapshot -i -C # Include onclick divs
# Interact using @refs
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser select @e1 "option"
agent-browser press Enter
agent-browser scroll down 500
# Get info
agent-browser get text @e1
agent-browser get url
agent-browser get title
# Wait
agent-browser wait @e1 # For element
agent-browser wait --load networkidle # For network idle
# Capture
agent-browser screenshot page.png
agent-browser screenshot --full # Full page
agent-browser pdf output.pdf
For the full command reference, see references/commands.md.
Ref Lifecycle (Important)
Refs (@e1, @e2) are invalidated when the page changes. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
Common Patterns
Form Submission
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser click @e5
agent-browser wait --load networkidle
Login with State Persistence
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME" && agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Reuse later
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Data Extraction
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text body > page.txt
Screenshot & Diff
agent-browser screenshot baseline.png
# ... changes happen ...
agent-browser diff screenshot --baseline baseline.png
Parallel Sessions
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser session list
Security (Optional)
export AGENT_BROWSER_CONTENT_BOUNDARIES=1 # Wrap output for AI safety
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com" # Domain allowlist
export AGENT_BROWSER_MAX_OUTPUT=50000 # Prevent context flooding
Cleanup
Always close sessions when done: agent-browser close
Files
3 totalSelect a file
Select a file to preview.
Comments
Loading comments…
