Install
openclaw skills install clawd-cursor-0-6-0AI desktop agent — control any app on Windows/macOS from your OpenClaw agent. Send natural language tasks to the Clawd Cursor API and it handles everything:...
openclaw skills install clawd-cursor-0-6-0If a human can do it on a screen, you can too. No API? No integration? No problem. Every app already has a UI — Clawd Cursor gives you eyes and hands to use all of them.
No task is impossible. If it's on the screen, it's within reach.
IMPORTANT — READ THIS BEFORE ANYTHING ELSE
You have a desktop AI agent over GUI running on the user's machine. This means:
- You CAN open apps, click buttons, type text, fill forms
- You CAN see what's on screen (via Clawd Cursor's screenshot pipeline)
- You CAN interact with applications — browser, desktop apps, settings
- You MUST respect safety tiers — 🟢 Auto actions run freely, 🔴 Confirm actions require user approval
- You MUST ask before accessing sensitive apps (email, banking, private messages, password managers)
- You SHOULD prefer direct tools (file read, web_fetch, CLI) over desktop automation when available
Think of Clawd Cursor as your hands and eyes on the user's computer — used with their permission.
You have a desktop UI agent available. Use it when the user asks you to interact with their desktop, or when no other tool can accomplish the task.
Always ask the user before accessing:
For quick page reads without a full task, connect to Chrome via Playwright CDP:
const pw = require('playwright');
const browser = await pw.chromium.connectOverCDP('http://127.0.0.1:9222');
const pages = browser.contexts()[0].pages();
const text = await pages[0].innerText('body');
Use this when you just need page content — faster than sending a task.
| Scenario | Use | Why |
|---|---|---|
| Read page content/text | CDP Direct | Instant, free |
| Fill a web form | REST API | Clawd handles multi-step planning |
| Check if a page loaded | CDP Direct | Just read the title/URL |
| Click through a complex UI flow | REST API | Clawd handles planning |
| Get a list of elements on page | CDP Direct | Fast DOM query |
| Interact with a desktop app | REST API | CDP is browser-only |
Base URL: http://127.0.0.1:3847
Note: On Windows PowerShell, use
curl.exe(with .exe) orInvoke-RestMethod. Barecurlis aliased toInvoke-WebRequestwhich behaves differently.
Before your first task, verify Clawd Cursor is running:
curl.exe -s http://127.0.0.1:3847/health
Expected: {"status":"ok","version":"0.6.0"}
If connection refused — start it yourself (don't ask the user):
# Find the skill directory and start the server
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<clawd-cursor-directory>" -WindowStyle Hidden
Start-Sleep 3
# Verify it's running
curl.exe -s http://127.0.0.1:3847/health
The skill directory is wherever SKILL.md lives (the parent of this file). Use that path as the working directory.
POST /task accepts the task and returns immediately. The task runs in the background. You must poll /status to know when it's done.
curl.exe -s -X POST http://127.0.0.1:3847/task -H "Content-Type: application/json" -d "{\"task\": \"YOUR_TASK_HERE\"}"
PowerShell:
Invoke-RestMethod -Uri http://127.0.0.1:3847/task -Method POST -ContentType "application/json" -Body '{"task": "YOUR_TASK_HERE"}'
1. POST /task → get accepted
2. Wait 2 seconds
3. GET /status
4. If status is "idle" → done
5. If status is "waiting_confirm" → ASK THE USER, then POST /confirm based on their answer
6. If still running → wait 2 more seconds, go to step 3
7. If 60+ seconds → POST /abort and retry with clearer instructions
curl.exe -s http://127.0.0.1:3847/status
Some actions (sending messages, deleting) require approval. 🔴 NEVER self-approve these. Always ask the user for confirmation before POST /confirm. These exist to protect the user — do not bypass them.
curl.exe -s -X POST http://127.0.0.1:3847/confirm -H "Content-Type: application/json" -d "{\"approved\": true}"
curl.exe -s -X POST http://127.0.0.1:3847/abort
curl.exe -s http://127.0.0.1:3847/logs
Returns last 200 log entries. Check for error or warn entries when tasks fail.
| State | Response | What to do |
|---|---|---|
| Accepted | {"accepted": true, "task": "..."} | Start polling |
| Running | {"status": "acting", "currentTask": "...", "stepsCompleted": 2} | Keep polling |
| Waiting confirm | {"status": "waiting_confirm", "currentStep": "..."} | POST /confirm |
| Done | {"status": "idle"} | Task complete |
| Busy | {"error": "Agent is busy", "state": {...}} | Wait or POST /abort first |
Chrome must be running with --remote-debugging-port=9222.
curl.exe -s http://127.0.0.1:9222/json/version
If this returns JSON, Chrome is ready.
const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP('http://127.0.0.1:9222');
const context = browser.contexts()[0];
const page = context.pages()[0];
// Read page content
const title = await page.title();
const url = page.url();
const text = await page.textContent('body');
// Click by role
await page.getByRole('button', { name: 'Submit' }).click();
// Fill a field
await page.getByLabel('Email').fill('user@example.com');
// Read specific elements
const buttons = await page.$$eval('button', els => els.map(e => e.textContent));
| Goal | Task to send |
|---|---|
| Simple navigation | Open Chrome and go to github.com |
| Read screen content | What text is currently displayed in Notepad? |
| Cross-app workflow | Copy the email address from the Chrome tab and paste it into the To field in Outlook |
| Form filling | In the open Chrome tab, fill the contact form: name "John Doe", email "john@example.com" |
| App interaction | Open Spotify and play the Discover Weekly playlist |
| Settings change | Open Windows Settings and turn on Dark Mode |
| Data extraction | Read the stock price shown in the Bloomberg tab in Chrome |
| Complex browser | Open YouTube, search for "Adele Hello", and play the first video result |
| Verification | Check if the deployment succeeded — look at the Vercel dashboard in Chrome |
| Send email | Open Gmail, compose email to john@example.com, subject: Meeting Tomorrow, body: Confirming 2pm. Best regards. |
| Take screenshot | Take a screenshot |
| Problem | Solution |
|---|---|
| Connection refused on :3847 | Start Clawd Cursor: cd clawd-cursor && npm start |
| Connection refused on :9222 | Start Chrome with CDP: Start-Process chrome -ArgumentList "--remote-debugging-port=9222" |
| Agent returns "busy" | Poll /status — wait for idle, or POST /abort |
| Task fails with no details | Check /logs for error entries |
| Task completes but wrong result | Rephrase with more specifics: exact app name, button text, field labels |
| Same task fails repeatedly | Break into smaller tasks (one action per task) |
| Safety confirmation pending | POST /confirm with {"approved": true} or {"approved": false} |
| Task hangs > 60 seconds | POST /abort, then retry with simpler phrasing |
| Layer | What | Speed | Cost |
|---|---|---|---|
| 0: Browser Layer | URL detection → direct navigation | Instant | Free |
| 1: Action Router | Regex + UI Automation | Instant | Free |
| 1.5: Smart Interaction | 1 LLM plan → CDP/UIDriver executes | ~2-5s | 1 LLM call |
| 2: Accessibility Reasoner | UI tree → text LLM decides | ~1s | Cheap |
| 3: Computer Use | Screenshot → vision LLM | ~5-8s | Expensive |
80%+ of tasks handled by Layer 0-1 (free, instant). Vision model is last resort only.
| Tier | Actions | Behavior |
|---|---|---|
| 🟢 Auto | Navigation, reading, opening apps | Runs immediately |
| 🟡 Preview | Typing, form filling | Logs before executing |
| 🔴 Confirm | Sending messages, deleting | Pauses — ask the user before POST /confirm. Never self-approve. |
127.0.0.1 only — not network accessible. Verify: netstat -an | findstr 3847 should show 127.0.0.1:3847--debug)Setup is handled by the user. If Clawd Cursor isn't running, start it yourself using the exec tool:
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<skill-directory>" -WindowStyle Hidden
Only ask the user if you cannot start it (e.g., node not installed, build missing).
git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor
npm install && npm run build
npx clawd-cursor doctor # auto-detects and configures everything
npm start # starts on port 3847
macOS: Grant Accessibility permission to terminal: System Settings → Privacy & Security → Accessibility
| Provider | Setup | Cost |
|---|---|---|
| Ollama (free) | ollama pull <model> | $0 (fully offline) |
| Any cloud provider | Set AI_API_KEY=your-key | Varies by provider |
| OpenClaw users | Automatic — no setup needed | Uses configured provider |
Proven optimizations applied to reduce task execution latency and LLM API costs. Reference files in perf/references/patches/.
| # | Name | Impact |
|---|---|---|
| 1 | Screenshot hash cache | 90% fewer LLM calls on static screens |
| 2 | Parallel screenshot+a11y | 30-40% per-step latency cut |
| 3 | A11y context cache (2s TTL) | Eliminates redundant PS spawns |
| 4 | Screenshot compression | 52% smaller payload (58KB vs 120KB) |
| 5 | Async debug writes | 94% less event loop blocking |
| 6 | Streaming LLM responses | 1-3s faster per LLM call |
| 7 | Trimmed system prompts | ~60% fewer prompt tokens |
| 8 | A11y tree filtering | Interactive elements only, 3000 char cap |
| 9 | Combined PS script | 1 spawn instead of 3 |
| 10 | Taskbar cache (30s TTL) | Skip expensive taskbar query |
| 11 | Delay reduction | 50-150ms vs 200-1500ms |
| Metric | v0.3 (VNC) | v0.4 (Native) | v0.4.1+ (Optimized) |
|---|---|---|---|
| Screenshot capture | ~850ms | ~50ms | ~57ms |
| Screenshot size | ~200KB | ~120KB | ~58KB |
| A11y context (uncached) | N/A | ~600ms | ~462ms |
| A11y context (cached) | N/A | 0ms | 0ms (2s TTL) |
| Delays (per step) | N/A | 200-1500ms | 50-600ms |
| System prompt tokens | N/A | ~800 | ~300 |
perf/apply-optimizations.ps1 — apply all patchesperf/perf-test.ts — benchmark harness (npx ts-node perf/perf-test.ts)