Openclaw Skill Browser Use

v1.0.0

Autonomous browser automation for AI agents. Two tools: agent-browser (CLI Playwright for step-by-step control) and browser-use (Python autonomous agent that...

⭐ 0· 92·0 current·0 all-time

by@yinj0012

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for yinj0012/openclaw-skill-browser-use.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Openclaw Skill Browser Use" (yinj0012/openclaw-skill-browser-use) from ClawHub.
Skill page: https://clawhub.ai/yinj0012/openclaw-skill-browser-use
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: node, npm, python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install openclaw-skill-browser-use

ClawHub CLI

Package manager switcher

npx clawhub@latest install openclaw-skill-browser-use

Security Scan

VirusTotal

Pending

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The stated purpose (CLI Playwright control + autonomous Python agent) matches the provided files and commands. Requiring node/npm/python3/chromium/xvfb is coherent for browser automation. However the skill does not declare required environment variables in registry metadata even though the scripts expect OPENAI_API_KEY or ANTHROPIC_API_KEY (and will auto-detect them from /root/.openclaw/openclaw.json). That discrepancy (no declared env but runtime use of credentials) is a notable mismatch.

Instruction Scope

SKILL.md and the included scripts direct the agent to run arbitrary browser actions (open, click, eval JS) which is expected. The installer and wrappers (scripts/install.sh and scripts/browser-use-agent.sh) read /root/.openclaw/openclaw.json to extract API keys if env vars are not set — this is reading a central agent config file outside the skill's declared scope. The wrapper also prints/executes a temporary Python script using those keys. Accessing a root OpenClaw config for secrets is scope creep and potentially surprising.

ℹ

Install Mechanism

There is no registry install spec, but an included scripts/install.sh performs apt-get installs, npm -g install agent-browser, pip installs of browser-use and langchain-* packages, and playwright browser installs. These are standard package sources (apt, npm, PyPI, Playwright) rather than arbitrary downloads, but the installer creates system-wide artifacts (/opt/browser-use, /usr/local/bin/browser-use-agent) and may require sudo. Global npm/pip installs and Playwright downloads should be reviewed before running.

Credentials

The skill will use LLM provider keys (OPENAI_API_KEY or ANTHROPIC_API_KEY) which are reasonable for an autonomous browsing agent, but the registry lists no required env vars. More importantly, both the wrapper and browser-use-agent.sh will attempt to pull keys from /root/.openclaw/openclaw.json if env vars are unset, which accesses central credentials. Reading that file can expose API keys stored by the platform — this is a high-sensitivity access that the skill did not declare.

Persistence & Privilege

Although always:false and model invocation is allowed (normal), the installer writes persistent, system-wide files: a venv in /opt/browser-use and a wrapper in /usr/local/bin/browser-use-agent. Combined with the auto-detection of keys from /root/.openclaw/openclaw.json, this persistent presence increases blast radius if keys are accessible. The installer also runs privileged package installs (apt-get) and global npm installs — running it as root or with sudo should be done cautiously.

What to consider before installing

What to consider before installing or running this skill: - The skill's functionality (headless browser automation + autonomous agent) is consistent with the files provided, but the scripts will try to read OpenClaw's central config (/root/.openclaw/openclaw.json) to auto-fill OPENAI_API_KEY or ANTHROPIC_API_KEY if you haven't exported them. That file can contain sensitive API keys for other services — decide whether you want this skill to access them. - The included installer performs system-wide changes: apt-get package installs, npm -g agent-browser, pip installs, Playwright browser downloads, creates /opt/browser-use and /usr/local/bin/browser-use-agent. Only run the installer on systems you control and understand; prefer running in a disposable VM/container or sandbox. - If you want to avoid the skill auto-reading platform credentials, set OPENAI_API_KEY or ANTHROPIC_API_KEY in the environment explicitly before running, or inspect and modify the wrapper/browser-use-agent.sh to remove the auto-detection lines that read /root/.openclaw/openclaw.json. - Review the npm package agent-browser and the PyPI package browser-use (and their dependencies) before global installation: npm packages can run postinstall scripts. If possible, install locally or pin versions. - Do not run the installer as root on a production host unless you accept the system-wide changes. Consider running the CLI-only agent-browser interactions without running the installer (if agent-browser is already present) or using a container. - If you want to proceed, audit scripts/install.sh and scripts/browser-use-agent.sh line-by-line (they are included) and confirm you accept creation of /opt and /usr/local artifacts and the reading of /root/.openclaw/openclaw.json.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🌐 Clawdis

Binsnode, npm, python3

latestvk97c8hqbjnq6gcwc9dpnx93jw183s7tg

92downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

Browser Use — Autonomous Browser Automation

Two complementary tools for browser automation:

Tool	Best for	How it works
agent-browser	Step-by-step control, scraping, form filling	CLI commands, you drive each action
browser-use	Complex autonomous tasks	Python agent that decides actions itself

Quick Start

agent-browser (recommended for most tasks)

# Navigate and inspect
agent-browser open "https://example.com"
agent-browser snapshot -i          # Get interactive elements with @refs

# Interact using refs
agent-browser click @e3            # Click element
agent-browser fill @e2 "text"      # Fill input (clears first)
agent-browser press Enter          # Press key

# Extract data
agent-browser get text @e1         # Get element text
agent-browser get attr @e1 href    # Get attribute
agent-browser screenshot /tmp/p.png # Screenshot

# Done
agent-browser close

browser-use (autonomous agent)

# Run a full autonomous browsing task
browser-use-agent "Find the pricing for Notion and compare plans"

The agent will navigate, click, read pages, and return a structured result.

agent-browser — Full Reference

Navigation

agent-browser open <url>           # Navigate to URL
agent-browser back                 # Go back
agent-browser forward              # Go forward
agent-browser reload               # Reload page
agent-browser close                # Close browser

Snapshot (page analysis)

agent-browser snapshot             # Full accessibility tree
agent-browser snapshot -i          # Interactive elements only (recommended)
agent-browser snapshot -c          # Compact output
agent-browser snapshot -d 3        # Limit depth to 3
agent-browser snapshot -s "#main"  # Scope to CSS selector
agent-browser snapshot -i --json   # JSON output for parsing

Interactions (use @refs from snapshot)

agent-browser click @e1            # Click
agent-browser dblclick @e1         # Double-click
agent-browser fill @e2 "text"      # Clear and type (use this for inputs)
agent-browser type @e2 "text"      # Type without clearing
agent-browser press Enter          # Press key
agent-browser press Control+a      # Key combination
agent-browser hover @e1            # Hover
agent-browser check @e1            # Check checkbox
agent-browser uncheck @e1          # Uncheck checkbox
agent-browser select @e1 "value"   # Select dropdown option
agent-browser scroll down 500      # Scroll page
agent-browser scrollintoview @e1   # Scroll element into view
agent-browser drag @e1 @e2         # Drag and drop
agent-browser upload @e1 file.pdf  # Upload files

Extract Data

agent-browser get text @e1         # Get element text
agent-browser get html @e1         # Get innerHTML
agent-browser get value @e1        # Get input value
agent-browser get attr @e1 href    # Get attribute
agent-browser get title            # Page title
agent-browser get url              # Current URL
agent-browser get count ".item"    # Count matching elements

Wait

agent-browser wait @e1             # Wait for element
agent-browser wait 2000            # Wait milliseconds
agent-browser wait --text "Done"   # Wait for text to appear
agent-browser wait --url "/dash"   # Wait for URL pattern
agent-browser wait --load networkidle  # Wait for network idle

Screenshots, PDF & Recording

agent-browser screenshot path.png      # Save screenshot
agent-browser screenshot --full        # Full page screenshot
agent-browser pdf output.pdf           # Save as PDF
agent-browser record start ./demo.webm # Start recording
agent-browser record stop              # Stop and save

Sessions (parallel browsers)

agent-browser --session s1 open "https://site1.com"
agent-browser --session s2 open "https://site2.com"
agent-browser session list

State (persist auth/cookies)

agent-browser state save auth.json     # Save session (cookies, storage)
agent-browser state load auth.json     # Restore session

Cookies & Storage

agent-browser cookies                  # Get all cookies
agent-browser cookies set name value   # Set cookie
agent-browser cookies clear            # Clear cookies
agent-browser storage local            # Get all localStorage
agent-browser storage local set k v    # Set value

Tabs & Frames

agent-browser tab                      # List tabs
agent-browser tab new [url]            # New tab
agent-browser tab 2                    # Switch to tab
agent-browser frame "#iframe"          # Switch to iframe
agent-browser frame main               # Back to main frame

Browser Settings

agent-browser set viewport 1920 1080
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set media dark

JavaScript

agent-browser eval "document.title"    # Run JS in page context

browser-use — Autonomous Agent

For complex tasks where you want the agent to figure out the browsing steps:

browser-use-agent "Your task description here"

Custom Script (advanced)

# Run via: /opt/browser-use/bin/python3 script.py
import asyncio, os
from browser_use import Agent, Browser
from langchain_anthropic import ChatAnthropic

async def run():
    browser = Browser()
    llm = ChatAnthropic(
        model='claude-sonnet-4-20250514',
        api_key=os.environ['ANTHROPIC_API_KEY']
    )
    agent = Agent(
        task="Compare pricing on 3 competitor sites",
        llm=llm,
        browser=browser,
    )
    result = await agent.run(max_steps=15)
    await browser.close()
    return result

asyncio.run(run())

You can swap the LLM for any langchain-compatible model (OpenAI, Anthropic, etc).

Standard Workflow

# 1. Open page
agent-browser open "https://example.com"

# 2. Snapshot to see what's on the page
agent-browser snapshot -i

# 3. Interact with elements using @refs from snapshot
agent-browser fill @e1 "search query"
agent-browser click @e2

# 4. Wait for new page to load
agent-browser wait --load networkidle

# 5. Re-snapshot (refs change after navigation!)
agent-browser snapshot -i

# 6. Extract what you need
agent-browser get text @e5

# 7. Close when done
agent-browser close

Important Rules

Always snapshot -i after navigation — refs change on every page load
Use fill not type for inputs — fill clears existing text first
Wait after clicks that trigger navigation — wait --load networkidle
Close the browser when done — agent-browser close
Google/Bing block headless browsers (CAPTCHA) — use DuckDuckGo or web_search instead
Save auth state for sites requiring login — state save/load
Use --json when you need machine-parseable output
Use sessions for parallel browsing — --session <name>

Troubleshooting

Element not found: Re-run snapshot -i to get current refs
Page not loaded: Add wait --load networkidle after navigation
CAPTCHA on search engines: Use DuckDuckGo or the web_search tool instead
Auth expired: Re-login and state save again
Display errors: The install script sets up Xvfb for headless rendering

Comments

Loading comments...