Surfagent Browser

v1.0.0

Control a real Chrome browser from your AI agent — navigate, click, type, fill forms, extract content, manage tabs, and automate workflows via SurfAgent's RE...

⭐ 0· 80·1 current·1 all-time

by@agentossoftware

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for agentossoftware/surfagent-browser.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Surfagent Browser" (agentossoftware/surfagent-browser) from ClawHub.
Skill page: https://clawhub.ai/agentossoftware/surfagent-browser
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: node
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install surfagent-browser

ClawHub CLI

Package manager switcher

npx clawhub@latest install surfagent-browser

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The skill's stated purpose (control a real Chrome via SurfAgent REST API on localhost:7201) matches the API calls documented in SKILL.md. However the registry metadata lists node as a required binary even though the runtime examples use curl/HTTP; it's unclear why node is required. This mismatch is unexplained and unnecessary for an instruction-only HTTP client.

Instruction Scope

The instructions direct the agent to call a localhost REST API that controls a real Chrome with persistent cookies and sessions. The API includes endpoints for evaluating arbitrary JavaScript (/browser/evaluate), extracting page state and content, filling forms (including credentials), and solving CAPTCHAs. Those actions legitimately belong to a browser-control skill, but they also provide broad access to sensitive user data and the ability to act as the user. SKILL.md does not specify where or how the Bearer token is managed, nor does it constrain use of evaluate() which can be used to read any page data and exfiltrate it to the agent.

✓

Install Mechanism

Instruction-only skill with no install spec and no code files — the lowest-risk install mechanism. Nothing is written to disk by the skill itself.

Credentials

The skill declares no required environment variables, yet it relies on a Bearer auth token for localhost:7201 calls — SKILL.md does not declare how the agent obtains that token or whether it should be provided via env. The declared requirement of the node binary is disproportionate given the HTTP-based instructions. The combination of missing token guidance and an unexplained binary requirement is inconsistent.

ℹ

Persistence & Privilege

The skill is not set to always:true and uses normal autonomous invocation settings. That is appropriate. However the capability it triggers—control of the user's real Chrome with persistent cookies/sessions—gives the skill a high effective privilege (ability to access logged-in accounts and perform actions). This is a design-level risk (powerful capability) rather than a metadata misconfiguration.

What to consider before installing

This skill lets an agent control a real Chrome browser on your machine, access logged-in sessions, run arbitrary page JS, fill forms (including credentials), and attempt CAPTCHA solves. Before installing: 1) Verify the SurfAgent daemon's provenance (download source, checksums, signing) and run it only if you trust it. 2) Find out how the Bearer token is issued/stored — do not provide agent or global secrets unless you intend to grant full browser control. 3) Prefer running SurfAgent with a dedicated, isolated Chrome profile (no saved passwords or personal sessions) or in a VM/container to limit exposure. 4) Ask the author why 'node' is required in metadata; it's not justified by the HTTP-only instructions. 5) Treat /browser/evaluate and content-extraction endpoints as high-risk: they can read and exfiltrate sensitive data, so only enable this skill for agents you trust and consider network/firewall rules that prevent unexpected outbound exfiltration. If you cannot confirm the daemon's trustworthiness or the token-management details, do not enable this skill.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🌐 Clawdis

Binsnode

latestvk973c88y6ym3say2njgfjgm779845ft0

80downloads

0stars

1versions

Updated 3w ago

v1.0.0

MIT-0

SurfAgent Browser Control — Agent Skill

Give your AI agent a real Chrome browser. Navigate, click, type, extract, and automate — all through a local REST API.

What This Is

SurfAgent runs a real Chrome browser on your desktop that your AI agent controls via a REST API (port 7201). Not headless. Not spoofed. A genuine Chrome with persistent cookies, real sessions, and a real fingerprint that passes bot detection.

Key difference from headless browsers: SurfAgent's Chrome passes hCaptcha, Cloudflare, Discord registration, and other bot detection that headless browsers fail. Your agent browses like a human.

Architecture

SurfAgent Daemon (port 7201)
  └── REST API → Chrome DevTools Protocol → Real Chrome (port 9222)

All requests go to http://localhost:7201 with Bearer auth token.

Quick Start

Check if SurfAgent is running

curl -s http://localhost:7201/health | jq .

Open a page

curl -s -X POST http://localhost:7201/browser/open \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_TOKEN' \
  -d '{"url": "https://github.com"}'

Get page state

curl -s -X POST http://localhost:7201/browser/state \
  -H 'Authorization: Bearer YOUR_TOKEN' | jq .

Core API Reference

Navigation

`POST /browser/open` — Open URL in new tab

{ "url": "https://example.com" }

Returns: { ok, tabId, title, url }

`POST /browser/navigate` — Navigate current tab

{ "url": "https://example.com", "tabId": "optional" }

Returns: { ok, tabId, title, url }

`POST /browser/back` / `POST /browser/forward` — History navigation

{ "tabId": "optional" }

`POST /browser/reload` — Reload page

{ "tabId": "optional", "ignoreCache": false }

Interaction

`POST /browser/click` — Click an element

{
  "selector": "#submit-btn",
  "tabId": "optional",
  "button": "left",
  "clickCount": 1
}

Also supports clicking by coordinates:

{ "x": 500, "y": 300 }

`POST /browser/type` — Type text (key by key)

{
  "selector": "#search-input",
  "text": "hello world",
  "tabId": "optional",
  "delay": 50
}

`POST /browser/fill` — Set input value directly

{
  "selector": "#email",
  "value": "user@example.com",
  "tabId": "optional"
}

React/Vue form fields: Use /browser/fill with dispatchEvents: true. Direct .value = assignment won't trigger React's state updates.

`POST /browser/select` — Select dropdown option

{
  "selector": "#country",
  "value": "US",
  "tabId": "optional"
}

`POST /browser/hover` — Hover over element

{ "selector": ".menu-trigger", "tabId": "optional" }

`POST /browser/scroll` — Scroll the page

{
  "tabId": "optional",
  "direction": "down",
  "amount": 500,
  "selector": "optional-scroll-container"
}

Direction: up, down, left, right

`POST /browser/press` — Press a keyboard key

{
  "key": "Enter",
  "tabId": "optional",
  "modifiers": ["Shift"]
}

Common keys: Enter, Tab, Escape, ArrowDown, Backspace, Delete

Page State & Content

`POST /browser/state` — Full structured page state

{
  "tabId": "optional",
  "includeElements": true,
  "maxElements": 100
}

Returns:

Page type (login, dashboard, feed, checkout, etc.)
Auth state (logged_in, logged_out, session_expired)
Interactive elements with selectors, roles, visibility, text
Form state (fields, filled count, submit button)
Blockers (cookie banners, captchas, auth walls)
Modals with actions
Content regions (nav, main, sidebar, footer)

`POST /browser/extract` — Extract text content

{
  "tabId": "optional",
  "selector": "optional — defaults to body",
  "format": "text"
}

Format: text, html, markdown

`POST /browser/screenshot` — Capture screenshot

{
  "tabId": "optional",
  "format": "png",
  "fullPage": false,
  "quality": 80,
  "clip": { "x": 0, "y": 0, "width": 800, "height": 600 }
}

Returns: { ok, data } (base64-encoded image)

`POST /browser/evaluate` — Run JavaScript

{
  "tabId": "optional",
  "expression": "document.title"
}

Returns: { ok, result }

⚠️ Use evaluate as a last resort. Prefer structured API calls over raw JS.

Tab Management

`GET /browser/tabs` — List all open tabs

Returns: [{ id, url, title, active }]

`POST /browser/tab/activate` — Switch to a tab

{ "tabId": "TARGET_TAB_ID" }

`POST /browser/tab/close` — Close a tab

{ "tabId": "TARGET_TAB_ID" }

`POST /browser/tab/name` — Bookmark a tab by name

{ "tabId": "TARGET_TAB_ID", "name": "twitter" }

Then use "tabId": "name:twitter" in any request to target it.

Important: Close tabs when done, especially social media tabs (X/Twitter). Open tabs flood CDP with events and degrade performance.

Blocker Resolution

`POST /browser/resolve-blocker` — Auto-dismiss blockers

{ "tabId": "optional" }

Handles: cookie consent banners, newsletter popups, notification permission dialogs. Returns: { ok, resolved, blockerType }

`POST /browser/captcha/detect` — Check for CAPTCHAs

{ "tabId": "optional" }

`POST /browser/captcha/solve` — Attempt CAPTCHA solve

{ "tabId": "optional" }

Forms

`POST /browser/fill-form` — Fill multiple form fields at once

{
  "tabId": "optional",
  "fields": [
    { "selector": "#email", "value": "user@example.com" },
    { "selector": "#password", "value": "secret123" },
    { "selector": "#remember", "checked": true }
  ],
  "submit": false
}

File Operations

`POST /browser/upload` — Upload a file to a file input

{
  "selector": "input[type=file]",
  "filePath": "/path/to/document.pdf",
  "tabId": "optional"
}

`POST /browser/download` — Download current page or resource

{
  "url": "https://example.com/report.pdf",
  "savePath": "/path/to/save/"
}

Browser Lifecycle

`POST /browser/launch` — Start the managed browser

{ "headless": false }

`POST /browser/close` — Close the managed browser

Closes all tabs and shuts down Chrome.

`GET /health` — Daemon health check

Returns: { status, version, browser, uptime }

`GET /status` — Detailed status

Returns: { daemon, browser, gateway, system }

Auth

All requests (except /health, /status, /readiness) require:

Authorization: Bearer <token>

The daemon generates a token on first start and saves it to ~/.surfagent/daemon-token.txt.

Common Patterns

Login to a Site

1. navigate to login page
2. state → confirm it's a login page
3. fill email + password fields
4. click submit
5. state → confirm auth_state changed to logged_in

Scrape Data from a Table

1. navigate to page
2. state → find table elements
3. evaluate → extract table rows as JSON
4. scroll down if needed
5. repeat extraction

Fill a Multi-Step Form

1. state → identify form fields + which step
2. fill-form with current step's fields
3. click "Next" / submit
4. state → confirm we're on next step
5. repeat until done

Monitor a Value

1. navigate to page
2. evaluate → extract the value
3. wait N seconds
4. evaluate again → compare

Tips & Gotchas

Close tabs when done. Open tabs (especially X/Twitter) generate constant CDP events that slow everything down. Close them after you're finished.

Use fill not type for form fields. type simulates keystrokes (slow, can trigger autocomplete). fill sets the value directly. Use type only when you need to trigger keystroke-based UI (search suggestions, autocomplete dropdowns).

React/Vue forms need event dispatch. If fill doesn't trigger the framework's state, add dispatchEvents: true. This dispatches input, change, and blur events that React/Vue listen for.

Check for blockers first. Many sites show cookie banners or modals on first visit. Call state → check for blockers → resolve-blocker before trying to interact.

State tokens from /browser/state expire. The daemon keeps a ring buffer of 5 state snapshots per tab. If you wait too long between state calls, old tokens may be evicted.

Evaluate is powerful but fragile. Raw JS can break on page updates. Prefer state + click/fill for interactions, and reserve evaluate for data extraction.

MCP Tools (via surfagent-mcp)

If using through the MCP server rather than direct HTTP:

MCP Tool	HTTP Equivalent
`surf_navigate`	POST /browser/navigate
`surf_click`	POST /browser/click
`surf_type`	POST /browser/type
`surf_fill`	POST /browser/fill
`surf_page_state`	POST /browser/state
`surf_extract`	POST /browser/extract
`surf_screenshot`	POST /browser/screenshot
`surf_evaluate`	POST /browser/evaluate
`surf_tabs`	GET /browser/tabs
`surf_tab_open`	POST /browser/open
`surf_tab_close`	POST /browser/tab/close
`surf_scroll`	POST /browser/scroll
`surf_press`	POST /browser/press
`surf_resolve_blocker`	POST /browser/resolve-blocker
`surf_fill_form`	POST /browser/fill-form
`surf_perceive`	POST /browser/perceive
`surf_annotate`	POST /browser/annotate
`surf_scene_diff`	POST /browser/perceive (with since)
`surf_health`	GET /health

For perception tools (perceive, annotate, scene_diff), see the surfagent-perception skill.

Comments

Loading comments...

Surfagent Browser

Install

Install with OpenClaw

CLI Commands

Runtime requirements

SurfAgent Browser Control — Agent Skill

What This Is

Architecture

Quick Start

Check if SurfAgent is running

Open a page

Get page state

Core API Reference

Navigation

POST /browser/open — Open URL in new tab

POST /browser/navigate — Navigate current tab

POST /browser/back / POST /browser/forward — History navigation

POST /browser/reload — Reload page

Interaction

POST /browser/click — Click an element

POST /browser/type — Type text (key by key)

POST /browser/fill — Set input value directly

POST /browser/select — Select dropdown option

POST /browser/hover — Hover over element

POST /browser/scroll — Scroll the page

POST /browser/press — Press a keyboard key