Install
openclaw skills install owl-browserDrive Owl Browser as an agent. Read pages as compact, handle-addressable OwlMark and click/type by handle, not by screenshot or pixel coordinates.
openclaw skills install owl-browserOwl Browser is an AI-native browser. Instead of screenshots or raw HTML, it renders each page as OwlMark: a compact, handle-addressable text view of what is actually on screen. You observe the page, then act on handles. This is far cheaper than a screenshot and removes pixel-coordinate guessing.
Every call is POST $OWL_API_ENDPOINT/execute/<tool> with a JSON body and
Authorization: Bearer $OWL_API_TOKEN. A reusable helper:
owl() { curl -s -X POST "$OWL_API_ENDPOINT/execute/$1" \
-H "Authorization: Bearer $OWL_API_TOKEN" \
-H "Content-Type: application/json" -d "$2"; }
create_context(render_mode=agent) -> navigate(url) -> observe
-> click/type(handle) -> observe -> ... -> close_context
observe after navigating and after every action. It is the only way you see the page.observe prints (e.g. l5, b12, x27). No CSS selectors, no pixel coordinates.observe blocks until the page is ready. Do not add a separate wait step.Creates a session. Returns result.context_id (use it in every later call). Do NOT pass context_id in.
owl browser_create_context '{"render_mode":"agent"}'
# use "both" if you will also screenshot; "pixel" is the legacy human render
owl browser_navigate '{"context_id":"ctx_...","url":"https://example.com"}'
Does NOT return the page. Call observe next.
Returns render (OwlMark text), handles (actionable elements), metadata, token_estimate.
owl browser_observe '{"context_id":"ctx_..."}'
owl browser_observe '{"context_id":"ctx_...","detail":"outline"}' # headings-only map of a long page
Params: detail = min | normal (default) | full | outline; region = main/nav/header/footer or a handle; max_tokens = soft budget.
A handle in the render looks like: - link "Pricing" [#l5] or textbox "Email" [#x27 val=""]. Pass the token (l5, x27).
Pass the handle token as selector (or handle). The response includes an effect:
navigated, dom-changed, or no-effect. Trust it, then re-observe.
owl browser_click '{"context_id":"ctx_...","selector":"l5"}'
owl browser_type '{"context_id":"ctx_...","selector":"x27","text":"you@example.com"}'
Also: browser_clear_input '{"context_id":"...","selector":"x27"}' before re-typing,
and browser_press_key '{"context_id":"...","key":"Enter"}' to submit.
owl browser_screenshot '{"context_id":"ctx_..."}'
owl browser_close_context '{"context_id":"ctx_..."}'
browser_expand '{"context_id":"...","handle":"R1"}' re-serializes one collapsed region/template at higher detail.browser_read_node '{"context_id":"...","handle":"M1"}' returns the full text of a single node (e.g. an article body).Check metadata.status on every observe:
ready — act on it.pending — the page has not rendered its content yet (a lazy client-rendered shell). The envelope has reason and retry_after_ms; re-observe after that delay. Do NOT treat a pending render as an empty page.incomplete — chrome rendered but main content did not; re-observe once, then use vision.metadata.dropped_surfaces tells you what text could not capture:
canvas / webgl / image:N — a visual surface. Use render_mode:"both" + browser_screenshot + your own vision.sparse_main / shell_unhydrated / main_content_unrendered — content is late or withheld; re-observe, then vision if still empty.first_tree_timeout — slow or bot-blocked; read it with a screenshot.Other cases:
effect is navigated, or a browser_navigate), re-observe to get fresh handles. Acting on a stale handle returns STALE_HANDLE; when you see that, re-observe.href="#section" link returns effect: "scrolled" and moves the viewport. Expected, not a failure.browser_navigate directly to the destination URL instead of clicking.render_mode:"both"); in-page plugin controls may not be actionable handles.observe printed.effect of a click/type before assuming it worked.detail:"outline" on long reference pages, then expand/read_node the part you need.context_id to create_context; it is returned to you.CTX=$(owl browser_create_context '{"render_mode":"agent"}' | jq -r .result.context_id)
owl browser_navigate "$(printf '{"context_id":"%s","url":"https://duckduckgo.com"}' "$CTX")"
owl browser_observe "{\"context_id\":\"$CTX\"}" # find the search box, e.g. x4
owl browser_type "{\"context_id\":\"$CTX\",\"selector\":\"x4\",\"text\":\"owl browser olib ai\"}"
owl browser_press_key "{\"context_id\":\"$CTX\",\"key\":\"Enter\"}"
owl browser_observe "{\"context_id\":\"$CTX\"}" # results appear, pick a link, e.g. l31
owl browser_click "{\"context_id\":\"$CTX\",\"selector\":\"l31\"}"
owl browser_observe "{\"context_id\":\"$CTX\"}" # read the opened page
owl browser_close_context "{\"context_id\":\"$CTX\"}"
render_mode=agent; the toolset is profile-scoped via OWL_MCP_PROFILE (agent, automation, webdev, full).GET $OWL_API_ENDPOINT/agent-skills.md.