MagicBrowse

Browser automation fallback through the magicbrowse CLI with goal-driven act as the default primitive and observe/primitives only for recovery on real web pages.

Audits

Pass

Install

openclaw skills install magicbrowse

Use magicbrowse to reach a target page when your own browser tooling cannot do it reliably. The planner runs two LLM loops per task and is slower than direct browser control; prefer your own tools when they suffice. Use magicbrowse to reach a target page (search, navigation, traversal through non-sensitive screens). At any login, identity, checkout, donation, subscription, payment, or human-verification page, stop and surface to the user — do not invent or type credentials, identity data, payment data, or any value you do not legitimately have.

Fallback Ladder

Try in order. Do not start at layer 4 just because primitives exist.

  1. Your own browser tooling (Computer Use, native browser tools).
  2. magicbrowse act "<goal>" — DOM-only navigator.
  3. magicbrowse act "<goal>" --use-vision — same goal, navigator with screenshots. Use only when the user is comfortable sending screenshots/page context for this workflow. Vision is a retry mode for the same task; keep the granule.
  4. magicbrowse observe + primitivesclick <target-id>, type <target-id> <text>, fill <target-id> <value>, select <target-id> <option-text>, press <keys>. Use only when vision-mode act cannot make progress, or when single-element precision is required. press is global — click first if focus matters.
  5. Surface failure to the user.

Preferred Pattern

For public navigation tasks, give act the semantic goal and a checkable terminal condition:

magicbrowse act "navigate to the public page that lists supported regions and stop when the region list is visible"

Avoid manually replaying snapshot ids before act has failed:

magicbrowse observemagicbrowse click 13magicbrowse observemagicbrowse click 23

Setup Check

  1. Run magicbrowse doctor first on a fresh install. It verifies the gateway config and reachability.
  2. If it fails because the API key is missing, install magicpay if needed, then run magicpay init <apiKey> (sign up at https://agents.mercuryo.io/signup), or set MAGICPAY_API_KEY in the environment. Persisted config lives at ~/.magicpay/config.json; MagicBrowse reads that shared config. Do not run magicbrowse init as part of the agent setup path.
  3. Only proceed to launch and act once doctor passes.

Hard Rules

Consequential actions require approval. magicbrowse may navigate, inspect, draft, and prepare. It must stop and ask before submitting a form, posting or sending content, accepting terms, changing account data or settings, booking, buying, ordering, deleting or modifying remote data, or otherwise committing an irreversible or account-affecting action. After approval, re-run observe and execute only the approved final action.

Protected data — never invent. Do not use act, type, fill, or select for any of the following on any page:

  • login or signup credentials (email, username, password, OTP),
  • identity-document fields (passport, ID, KYC address, DOB tied to identity),
  • payment-card or banking fields (PAN, CVV, expiry, IBAN, account),
  • any value sourced from a vault or secret store, or any value you do not legitimately have.

Reach the page, stop before entering protected values, and switch to MagicPay for the sensitive step. Do not guess, placeholder, or fabricate protected values. Be honest about what you cannot do.

Use act before snapshot primitives. Do not start MagicBrowse work with observe plus click/type/select/press/fill before attempting act on the same goal. Why: the navigator keeps the goal, current page context, and completion check in one planner loop instead of spreading them across fragile snapshot ids. Use primitives only after DOM-only and vision-mode act cannot make progress, or when the recovery step is deliberately single-element.

Target-ids are snapshot-scoped. Valid only for the observe snapshot that produced them. Re-run observe after any click, type, navigation, popup, or lazy-load before the next primitive — reusing an old id silently addresses a different element.

observeclick 12observetype 7 "hello"observeclick 12type 7 "hello"

One workflow per default home. The current-session pointer at $MAGICBROWSE_HOME/current-session.json (default ~/.magicbrowse/) and MagicPay workflow state under ~/.magicpay/ are singletons. Concurrent workflows on the same homes overwrite each other. For parallel use, set a distinct MAGICBROWSE_HOME and run MagicPay under a separate HOME or isolated runtime environment, or do not run the tasks in parallel.

Fresh browser by default. Prefer an owned, fresh browser session. Use attach, --profile, or --user-data-dir only when the user explicitly approves that browser/session for the current task. Keep CDP endpoints private. Close the session before unrelated work.

Page context can leave the browser. LLM-backed act sends page state to the gateway; --use-vision can include screenshots. Avoid private pages unless the user approves that workflow, and stop at login, identity, checkout, donation, subscription, or payment pages.

Primary Workflow

Contract: launch [url] → act … act → close. Sequential act calls in one session preserve page state and planner memory.

  1. magicbrowse launch <url> — start a headless owned Chrome session pre-placed at the entry URL. Keep browser launches headless unless the user explicitly asks for a visible browser or you are doing live debugging. To attach to an existing CDP browser instead, first get explicit user approval for that endpoint/session: magicbrowse attach <cdp-url-or-ws-endpoint> (positional, not a --cdp-url flag).
  2. magicbrowse act "<goal>" — natural-language browser step. Prompt is positional. act does not take --url; you cannot reset the page from inside act. To re-anchor, close and launch again.
  3. Repeat act for the next strategic granule.
  4. magicbrowse close — release the session when the overall MagicBrowse-owned browser task is done. If the workflow hands off to MagicPay on a sensitive page, keep the browser open until MagicPay finishes its workflow. After MagicPay completes, close only a MagicBrowse-owned disposable browser that the user is not taking over; do not close an external/user-owned attach without explicit approval.

magicbrowse run exists in the CLI for one-shot developer use. It is not part of this skill contract — its bundled close destroys continuity. Do not use it in an orchestrated workflow.

Goal Granularity

  1. Granule = atomic strategic segment. End each act where the orchestrator needs the next strategic decision. Tactics (which form field first) live inside act; strategy (this partner is wrong, try another) lives between act calls.
  2. Target horizon: 15-30 navigator steps per act; smaller is safer. maxSteps: 100 is a safety ceiling. The planner self-validates terminal status, so longer tasks have more room for false-positive completion. Prefer smaller granules when the success criterion cannot be checked externally.
  3. Auth walls and CAPTCHA are hard boundaries, not obstacles. A task that reaches auth, CAPTCHA, or human verification ends with status: needs_handoff, not failed. Plan tasks to end at such a wall, not through it. magicbrowse does not solve CAPTCHA and does not enter credentials. For a confirmed real CAPTCHA on the current approved browser session, use magicpay solve-captcha [--timeout <s>]; after a successful solve, run magicbrowse mark-captcha-resolved before the next act. Never retry the same act against the same wall. If the page asks for something you cannot legitimately provide, be honest about it.
  4. Rely on session memory; do not re-narrate. Sequential act calls in one session preserve page state and planner memory. Do not write "as we already found, continue with…" into goals — if you feel the need to, the granularity is wrong.

Goal Formulation

  1. No element indexes or selectors in goal text. Indexes renumber on every DOM scan. Describe elements semantically.
    • act "click target 14"
    • act "click the 'Continue' button under the price summary"
  2. Describe the expected terminal state where it adds a checkable criterion.
    • act "get to checkout"
    • act "navigate to a checkout page that shows passenger fields and total fare"
  3. Pass the starting URL to launch, not as a separate step. To switch sites mid-workflow, either close and re-launch, or describe the navigation inside the goal text.

Common Mistakes

  • Element indexes ([14], target 7) in goal text.
  • magicbrowse run for orchestrated multi-step workflows.
  • type / fill / select / act on protected fields. Stop at the form boundary and surface to the user instead of inventing or placeholdering protected data.
  • Letting act submit, post, book, buy, save, delete, or otherwise commit an account-affecting action without explicit approval.
  • Trying to solve CAPTCHA through magicbrowse. On a confirmed real CAPTCHA, use magicpay solve-captcha [--timeout <s>], then magicbrowse mark-captcha-resolved before the next MagicBrowse step.
  • Attaching to a logged-in browser or named profile without explicit approval for the current task.
  • Closing a browser that was handed to MagicPay or the user before the overall task is actually done.
  • Re-narrating prior act results into the next goal — sequential act calls keep state.
  • Skipping the act-first path and starting at layer 4 (observe + primitives).
  • Reusing a target-id from before a click, navigation, or popup.

Status and Errors

act returns status: completed | blocked | needs_handoff | needs_approval | failed | max_steps | cancelled. Branch on status; do not parse finalMessage to detect missing input, protected-data handoff, or approval stops. finalMessage is the explanation to show the user or pass upstream. Exit code 0 includes blocked, needs_handoff, and needs_approval; it does not mean success. See references/statuses.md.

References