{"skill":{"slug":"agent-browser-0-2-0","displayName":"Agent Browser 0.2.0","summary":"A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.","description":"---\nname: Agent Browser\ndescription: A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.\nread_when:\n  - Automating web interactions\n  - Extracting structured data from pages\n  - Filling forms programmatically\n  - Testing web UIs\nmetadata: {\"clawdbot\":{\"emoji\":\"🌐\",\"requires\":{\"bins\":[\"node\",\"npm\"]}}}\nallowed-tools: Bash(agent-browser:*)\n---\n\n# Browser Automation with agent-browser\n\n## Installation\n\n### npm recommended\n\n```bash\nnpm install -g agent-browser\nagent-browser install\nagent-browser install --with-deps\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/vercel-labs/agent-browser\ncd agent-browser\npnpm install\npnpm build\nagent-browser install\n```\n\n## Quick start\n\n```bash\nagent-browser open <url>        # Navigate to page\nagent-browser snapshot -i       # Get interactive elements with refs\nagent-browser click @e1         # Click element by ref\nagent-browser fill @e2 \"text\"   # Fill input by ref\nagent-browser close             # Close browser\n```\n\n## Core workflow\n\n1. Navigate: `agent-browser open <url>`\n2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`)\n3. Interact using refs from the snapshot\n4. Re-snapshot after navigation or significant DOM changes\n\n## Commands\n\n### Navigation\n\n```bash\nagent-browser open <url>      # Navigate to URL\nagent-browser back            # Go back\nagent-browser forward         # Go forward\nagent-browser reload          # Reload page\nagent-browser close           # Close browser\n```\n\n### Snapshot (page analysis)\n\n```bash\nagent-browser snapshot            # Full accessibility tree\nagent-browser snapshot -i         # Interactive elements only (recommended)\nagent-browser snapshot -c         # Compact output\nagent-browser snapshot -d 3       # Limit depth to 3\nagent-browser snapshot -s \"#main\" # Scope to CSS selector\n```\n\n### Interactions (use @refs from snapshot)\n\n```bash\nagent-browser click @e1           # Click\nagent-browser dblclick @e1        # Double-click\nagent-browser focus @e1           # Focus element\nagent-browser fill @e2 \"text\"     # Clear and type\nagent-browser type @e2 \"text\"     # Type without clearing\nagent-browser press Enter         # Press key\nagent-browser press Control+a     # Key combination\nagent-browser keydown Shift       # Hold key down\nagent-browser keyup Shift         # Release key\nagent-browser hover @e1           # Hover\nagent-browser check @e1           # Check checkbox\nagent-browser uncheck @e1         # Uncheck checkbox\nagent-browser select @e1 \"value\"  # Select dropdown\nagent-browser scroll down 500     # Scroll page\nagent-browser scrollintoview @e1  # Scroll element into view\nagent-browser drag @e1 @e2        # Drag and drop\nagent-browser upload @e1 file.pdf # Upload files\n```\n\n### Get information\n\n```bash\nagent-browser get text @e1        # Get element text\nagent-browser get html @e1        # Get innerHTML\nagent-browser get value @e1       # Get input value\nagent-browser get attr @e1 href   # Get attribute\nagent-browser get title           # Get page title\nagent-browser get url             # Get current URL\nagent-browser get count \".item\"   # Count matching elements\nagent-browser get box @e1         # Get bounding box\n```\n\n### Check state\n\n```bash\nagent-browser is visible @e1      # Check if visible\nagent-browser is enabled @e1      # Check if enabled\nagent-browser is checked @e1      # Check if checked\n```\n\n### Screenshots & PDF\n\n```bash\nagent-browser screenshot          # Screenshot to stdout\nagent-browser screenshot path.png # Save to file\nagent-browser screenshot --full   # Full page\nagent-browser pdf output.pdf      # Save as PDF\n```\n\n### Video recording\n\n```bash\nagent-browser record start ./demo.webm    # Start recording (uses current URL + state)\nagent-browser click @e1                   # Perform actions\nagent-browser record stop                 # Stop and save video\nagent-browser record restart ./take2.webm # Stop current + start new recording\n```\n\nRecording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.\n\n### Wait\n\n```bash\nagent-browser wait @e1                     # Wait for element\nagent-browser wait 2000                    # Wait milliseconds\nagent-browser wait --text \"Success\"        # Wait for text\nagent-browser wait --url \"/dashboard\"    # Wait for URL pattern\nagent-browser wait --load networkidle      # Wait for network idle\nagent-browser wait --fn \"window.ready\"     # Wait for JS condition\n```\n\n### Mouse control\n\n```bash\nagent-browser mouse move 100 200      # Move mouse\nagent-browser mouse down left         # Press button\nagent-browser mouse up left           # Release button\nagent-browser mouse wheel 100         # Scroll wheel\n```\n\n### Semantic locators (alternative to refs)\n\n```bash\nagent-browser find role button click --name \"Submit\"\nagent-browser find text \"Sign In\" click\nagent-browser find label \"Email\" fill \"user@test.com\"\nagent-browser find first \".item\" click\nagent-browser find nth 2 \"a\" text\n```\n\n### Browser settings\n\n```bash\nagent-browser set viewport 1920 1080      # Set viewport size\nagent-browser set device \"iPhone 14\"      # Emulate device\nagent-browser set geo 37.7749 -122.4194   # Set geolocation\nagent-browser set offline on              # Toggle offline mode\nagent-browser set headers '{\"X-Key\":\"v\"}' # Extra HTTP headers\nagent-browser set credentials user pass   # HTTP basic auth\nagent-browser set media dark              # Emulate color scheme\n```\n\n### Cookies & Storage\n\n```bash\nagent-browser cookies                     # Get all cookies\nagent-browser cookies set name value      # Set cookie\nagent-browser cookies clear               # Clear cookies\nagent-browser storage local               # Get all localStorage\nagent-browser storage local key           # Get specific key\nagent-browser storage local set k v       # Set value\nagent-browser storage local clear         # Clear all\n```\n\n### Network\n\n```bash\nagent-browser network route <url>              # Intercept requests\nagent-browser network route <url> --abort      # Block requests\nagent-browser network route <url> --body '{}'  # Mock response\nagent-browser network unroute [url]            # Remove routes\nagent-browser network requests                 # View tracked requests\nagent-browser network requests --filter api    # Filter requests\n```\n\n### Tabs & Windows\n\n```bash\nagent-browser tab                 # List tabs\nagent-browser tab new [url]       # New tab\nagent-browser tab 2               # Switch to tab\nagent-browser tab close           # Close tab\nagent-browser window new          # New window\n```\n\n### Frames\n\n```bash\nagent-browser frame \"#iframe\"     # Switch to iframe\nagent-browser frame main          # Back to main frame\n```\n\n### Dialogs\n\n```bash\nagent-browser dialog accept [text]  # Accept dialog\nagent-browser dialog dismiss        # Dismiss dialog\n```\n\n### JavaScript\n\n```bash\nagent-browser eval \"document.title\"   # Run JavaScript\n```\n\n### State management\n\n```bash\nagent-browser state save auth.json    # Save session state\nagent-browser state load auth.json    # Load saved state\n```\n\n## Example: Form submission\n\n```bash\nagent-browser open https://example.com/form\nagent-browser snapshot -i\n# Output shows: textbox \"Email\" [ref=e1], textbox \"Password\" [ref=e2], button \"Submit\" [ref=e3]\n\nagent-browser fill @e1 \"user@example.com\"\nagent-browser fill @e2 \"password123\"\nagent-browser click @e3\nagent-browser wait --load networkidle\nagent-browser snapshot -i  # Check result\n```\n\n## Example: Authentication with saved state\n\n```bash\n# Login once\nagent-browser open https://app.example.com/login\nagent-browser snapshot -i\nagent-browser fill @e1 \"username\"\nagent-browser fill @e2 \"password\"\nagent-browser click @e3\nagent-browser wait --url \"/dashboard\"\nagent-browser state save auth.json\n\n# Later sessions: load saved state\nagent-browser state load auth.json\nagent-browser open https://app.example.com/dashboard\n```\n\n## Sessions (parallel browsers)\n\n```bash\nagent-browser --session test1 open site-a.com\nagent-browser --session test2 open site-b.com\nagent-browser session list\n```\n\n## JSON output (for parsing)\n\nAdd `--json` for machine-readable output:\n\n```bash\nagent-browser snapshot -i --json\nagent-browser get text @e1 --json\n```\n\n## Debugging\n\n```bash\nagent-browser open example.com --headed              # Show browser window\nagent-browser console                                # View console messages\nagent-browser console --clear                        # Clear console\nagent-browser errors                                 # View page errors\nagent-browser errors --clear                         # Clear errors\nagent-browser highlight @e1                          # Highlight element\nagent-browser trace start                            # Start recording trace\nagent-browser trace stop trace.zip                   # Stop and save trace\nagent-browser record start ./debug.webm              # Record from current page\nagent-browser record stop                            # Save recording\nagent-browser --cdp 9222 snapshot                    # Connect via CDP\n```\n\n## Troubleshooting\n\n- If the command is not found on Linux ARM64, use the full path in the bin folder.\n- If an element is not found, use snapshot to find the correct ref.\n- If the page is not loaded, add a wait command after navigation.\n- Use --headed to see the browser window for debugging.\n\n## Options\n\n- --session <name> uses an isolated session.\n- --json provides JSON output.\n- --full takes a full page screenshot.\n- --headed shows the browser window.\n- --timeout sets the command timeout in milliseconds.\n- --cdp <port> connects via Chrome DevTools Protocol.\n\n## Notes\n\n- Refs are stable per page load but change on navigation.\n- Always snapshot after navigation to get new refs.\n- Use fill instead of type for input fields to ensure existing text is cleared.\n\n## Reporting Issues\n\n- Skill issues: Open an issue at https://github.com/TheSethRose/Agent-Browser-CLI\n- agent-browser CLI issues: Open an issue at https://github.com/vercel-labs/agent-browser\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":2068,"installsAllTime":140,"installsCurrent":140,"stars":1,"versions":1},"createdAt":1771003808269,"updatedAt":1778990350206},"latestVersion":{"version":"1.0.0","createdAt":1771003808269,"changelog":"- Initial release of Agent Browser skill.\n- Provides Rust-based, fast, headless browser automation CLI with Node.js fallback.\n- Supports navigation, structured page snapshots, element interaction (click, type, fill, etc.), and UI testing via concise commands.\n- Enables extraction of structured data, programmatic form filling, screenshots, video recording, cookies/storage management, and advanced browser control.\n- Includes semantic locators, network request interception, multi-tab/window/frame handling, and JavaScript evaluation.","license":null},"metadata":{"setup":[],"os":null,"systems":null},"owner":{"handle":"knightluozichu","userId":"s179b4kybqdtxcqnfw3c0322ah88408c","displayName":"Knightluozichu","image":"https://avatars.githubusercontent.com/u/38971873?v=4"},"moderation":null}