Clawd Cursor 0.6.0

Data & APIs

AI desktop agent — control any app on Windows/macOS from your OpenClaw agent. Send natural language tasks to the Clawd Cursor API and it handles everything: opening apps, clicking buttons, typing text, navigating browsers, filling forms. If you can click it, your agent can too.

Install

openclaw skills install clawd-cursor-0-6-0

Clawd Cursor

If a human can do it on a screen, you can too. No API? No integration? No problem. Every app already has a UI — Clawd Cursor gives you eyes and hands to use all of them.

No task is impossible. If it's on the screen, it's within reach.

IMPORTANT — READ THIS BEFORE ANYTHING ELSE

You have a desktop AI agent over GUI running on the user's machine. This means:

You CAN open apps, click buttons, type text, fill forms

You CAN see what's on screen (via Clawd Cursor's screenshot pipeline)

You CAN interact with applications — browser, desktop apps, settings

You MUST respect safety tiers — 🟢 Auto actions run freely, 🔴 Confirm actions require user approval

You MUST ask before accessing sensitive apps (email, banking, private messages, password managers)

You SHOULD prefer direct tools (file read, web_fetch, CLI) over desktop automation when available

Think of Clawd Cursor as your hands and eyes on the user's computer — used with their permission.

When to Use This (AGENTS: READ THIS)

You have a desktop UI agent available. Use it when the user asks you to interact with their desktop, or when no other tool can accomplish the task.

Use Clawd Cursor for:

User-requested desktop tasks — "open Gmail and send an email", "check my calendar"
Read a webpage — when web_fetch or browser tools aren't available
Interact with desktop apps — click buttons, fill forms, read results
Browser tasks — search, navigate, fill forms (when browser tool unavailable)
Visual verification — did the page load? what does the UI show?
Cross-app workflows — copy from one app, paste in another
Settings changes — when the user explicitly asks

⚠️ Sensitive App Policy

Always ask the user before accessing:

Email clients (Gmail, Outlook)
Banking or financial apps
Private messaging (WhatsApp, Signal, Telegram)
Password managers
Admin panels or cloud consoles

Don't use Clawd Cursor when:

You can do it with a direct API call or CLI command (faster)
The task is purely computational (math, text generation, code writing)
You can already read/write the file directly
The browser tool or web_fetch can handle it

Direct Browser Access (Fast Path)

For quick page reads without a full task, connect to Chrome via Playwright CDP:

const pw = require('playwright');
const browser = await pw.chromium.connectOverCDP('http://127.0.0.1:9222');
const pages = browser.contexts()[0].pages();
const text = await pages[0].innerText('body');

Use this when you just need page content — faster than sending a task.

Scenario	Use	Why
Read page content/text	CDP Direct	Instant, free
Fill a web form	REST API	Clawd handles multi-step planning
Check if a page loaded	CDP Direct	Just read the title/URL
Click through a complex UI flow	REST API	Clawd handles planning
Get a list of elements on page	CDP Direct	Fast DOM query
Interact with a desktop app	REST API	CDP is browser-only

REST API Reference

Base URL: http://127.0.0.1:3847

Note: On Windows PowerShell, use curl.exe (with .exe) or Invoke-RestMethod. Bare curl is aliased to Invoke-WebRequest which behaves differently.

Pre-flight Check

Before your first task, verify Clawd Cursor is running:

curl.exe -s http://127.0.0.1:3847/health

Expected: {"status":"ok","version":"0.6.0"}

If connection refused — start it yourself (don't ask the user):

# Find the skill directory and start the server
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<clawd-cursor-directory>" -WindowStyle Hidden
Start-Sleep 3
# Verify it's running
curl.exe -s http://127.0.0.1:3847/health

The skill directory is wherever SKILL.md lives (the parent of this file). Use that path as the working directory.

Sending a Task (Async — Returns Immediately)

POST /task accepts the task and returns immediately. The task runs in the background. You must poll /status to know when it's done.

curl.exe -s -X POST http://127.0.0.1:3847/task -H "Content-Type: application/json" -d "{\"task\": \"YOUR_TASK_HERE\"}"

PowerShell:

Invoke-RestMethod -Uri http://127.0.0.1:3847/task -Method POST -ContentType "application/json" -Body '{"task": "YOUR_TASK_HERE"}'

Polling Pattern (Follow This)

1. POST /task → get accepted
2. Wait 2 seconds
3. GET /status
4. If status is "idle" → done
5. If status is "waiting_confirm" → ASK THE USER, then POST /confirm based on their answer
6. If still running → wait 2 more seconds, go to step 3
7. If 60+ seconds → POST /abort and retry with clearer instructions

Checking Status

curl.exe -s http://127.0.0.1:3847/status

Confirming Safety-Gated Actions

Some actions (sending messages, deleting) require approval. 🔴 NEVER self-approve these. Always ask the user for confirmation before POST /confirm. These exist to protect the user — do not bypass them.

curl.exe -s -X POST http://127.0.0.1:3847/confirm -H "Content-Type: application/json" -d "{\"approved\": true}"

Aborting a Task

curl.exe -s -X POST http://127.0.0.1:3847/abort

Reading Logs (Debugging)

curl.exe -s http://127.0.0.1:3847/logs

Returns last 200 log entries. Check for error or warn entries when tasks fail.

Response States

State	Response	What to do
Accepted	`{"accepted": true, "task": "..."}`	Start polling
Running	`{"status": "acting", "currentTask": "...", "stepsCompleted": 2}`	Keep polling
Waiting confirm	`{"status": "waiting_confirm", "currentStep": "..."}`	POST /confirm
Done	`{"status": "idle"}`	Task complete
Busy	`{"error": "Agent is busy", "state": {...}}`	Wait or POST /abort first

CDP Direct Reference

Chrome must be running with --remote-debugging-port=9222.

Quick check:

curl.exe -s http://127.0.0.1:9222/json/version

If this returns JSON, Chrome is ready.

Connecting via Playwright:

const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP('http://127.0.0.1:9222');
const context = browser.contexts()[0];
const page = context.pages()[0];

// Read page content
const title = await page.title();
const url = page.url();
const text = await page.textContent('body');

// Click by role
await page.getByRole('button', { name: 'Submit' }).click();

// Fill a field
await page.getByLabel('Email').fill('user@example.com');

// Read specific elements
const buttons = await page.$$eval('button', els => els.map(e => e.textContent));

Task Writing Guidelines

Be specific — include app names, URLs, exact text to type, button names
One task at a time — wait for completion before sending the next
Describe the goal, not the clicks — say "Send an email to john@example.com about the meeting" not "click compose, click to field..."
Check status if a task seems to hang
Don't include credentials in task text — tasks are logged

Task Examples

Goal	Task to send
Simple navigation	`Open Chrome and go to github.com`
Read screen content	`What text is currently displayed in Notepad?`
Cross-app workflow	`Copy the email address from the Chrome tab and paste it into the To field in Outlook`
Form filling	`In the open Chrome tab, fill the contact form: name "John Doe", email "john@example.com"`
App interaction	`Open Spotify and play the Discover Weekly playlist`
Settings change	`Open Windows Settings and turn on Dark Mode`
Data extraction	`Read the stock price shown in the Bloomberg tab in Chrome`
Complex browser	`Open YouTube, search for "Adele Hello", and play the first video result`
Verification	`Check if the deployment succeeded — look at the Vercel dashboard in Chrome`
Send email	`Open Gmail, compose email to john@example.com, subject: Meeting Tomorrow, body: Confirming 2pm. Best regards.`
Take screenshot	`Take a screenshot`

Error Recovery

Problem	Solution
Connection refused on :3847	Start Clawd Cursor: `cd clawd-cursor && npm start`
Connection refused on :9222	Start Chrome with CDP: `Start-Process chrome -ArgumentList "--remote-debugging-port=9222"`
Agent returns "busy"	Poll `/status` — wait for idle, or POST `/abort`
Task fails with no details	Check `/logs` for error entries
Task completes but wrong result	Rephrase with more specifics: exact app name, button text, field labels
Same task fails repeatedly	Break into smaller tasks (one action per task)
Safety confirmation pending	POST `/confirm` with `{"approved": true}` or `{"approved": false}`
Task hangs > 60 seconds	POST `/abort`, then retry with simpler phrasing

How It Works — 4-Layer Pipeline

Layer	What	Speed	Cost
0: Browser Layer	URL detection → direct navigation	Instant	Free
1: Action Router	Regex + UI Automation	Instant	Free
1.5: Smart Interaction	1 LLM plan → CDP/UIDriver executes	~2-5s	1 LLM call
2: Accessibility Reasoner	UI tree → text LLM decides	~1s	Cheap
3: Computer Use	Screenshot → vision LLM	~5-8s	Expensive

80%+ of tasks handled by Layer 0-1 (free, instant). Vision model is last resort only.

Safety Tiers

Tier	Actions	Behavior
🟢 Auto	Navigation, reading, opening apps	Runs immediately
🟡 Preview	Typing, form filling	Logs before executing
🔴 Confirm	Sending messages, deleting	Pauses — ask the user before POST `/confirm`. Never self-approve.

Security & Privacy

Network Isolation

API binds to 127.0.0.1 only — not network accessible. Verify: netstat -an | findstr 3847 should show 127.0.0.1:3847
Screenshots stay in memory, never saved to disk (unless --debug)
No telemetry, no analytics, no phone-home calls

Data Flow

With Ollama (local): 100% offline — zero external network calls. No data leaves the machine.
With cloud providers: screenshots/text are sent to the user's chosen provider API only. No data goes to skill authors, ClawHub, or third parties.
OpenClaw users: credentials auto-discovered from local config files — no keys stored in skill directory.
The user controls data flow by choosing their provider. Ollama = fully private.

Agent Autonomy Controls

🟢 Auto actions (navigation, reading, opening apps) run without prompting
🟡 Preview actions (typing, form filling) are logged before executing
🔴 Confirm actions (sending messages, deleting, purchases) always pause for user approval
Agents must ask the user before accessing sensitive apps (email, banking, messaging, passwords)
Agents must never self-approve 🔴 Confirm actions

Setup (User Reference)

Setup is handled by the user. If Clawd Cursor isn't running, start it yourself using the exec tool:

Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<skill-directory>" -WindowStyle Hidden

Only ask the user if you cannot start it (e.g., node not installed, build missing).

git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor
npm install && npm run build
npx clawd-cursor doctor    # auto-detects and configures everything
npm start                  # starts on port 3847

macOS: Grant Accessibility permission to terminal: System Settings → Privacy & Security → Accessibility

Provider	Setup	Cost
Ollama (free)	`ollama pull <model>`	$0 (fully offline)
Any cloud provider	Set `AI_API_KEY=your-key`	Varies by provider
OpenClaw users	Automatic — no setup needed	Uses configured provider

Performance Optimization

Proven optimizations applied to reduce task execution latency and LLM API costs. Reference files in perf/references/patches/.

Applied Optimizations

#	Name	Impact
1	Screenshot hash cache	90% fewer LLM calls on static screens
2	Parallel screenshot+a11y	30-40% per-step latency cut
3	A11y context cache (2s TTL)	Eliminates redundant PS spawns
4	Screenshot compression	52% smaller payload (58KB vs 120KB)
5	Async debug writes	94% less event loop blocking
6	Streaming LLM responses	1-3s faster per LLM call
7	Trimmed system prompts	~60% fewer prompt tokens
8	A11y tree filtering	Interactive elements only, 3000 char cap
9	Combined PS script	1 spawn instead of 3
10	Taskbar cache (30s TTL)	Skip expensive taskbar query
11	Delay reduction	50-150ms vs 200-1500ms

Benchmarks (2560x1440)

Metric	v0.3 (VNC)	v0.4 (Native)	v0.4.1+ (Optimized)
Screenshot capture	~850ms	~50ms	~57ms
Screenshot size	~200KB	~120KB	~58KB
A11y context (uncached)	N/A	~600ms	~462ms
A11y context (cached)	N/A	0ms	0ms (2s TTL)
Delays (per step)	N/A	200-1500ms	50-600ms
System prompt tokens	N/A	~800	~300

Perf Tools

perf/apply-optimizations.ps1 — apply all patches
perf/perf-test.ts — benchmark harness (npx ts-node perf/perf-test.ts)