Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

ClawdCursor

v0.7.5

OS-level desktop automation tool server. 42 tools for controlling any application on Windows, macOS, and Linux. Model-agnostic — works with any AI that can d...

0· 138·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for amrdab/clawdcursor.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "ClawdCursor" (amrdab/clawdcursor) from ClawHub.
Skill page: https://clawhub.ai/amrdab/clawdcursor
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install clawdcursor

ClawHub CLI

Package manager switcher

npx clawhub@latest install clawdcursor
Security Scan
Capability signals
CryptoCan make purchases
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name, description and runtime instructions consistently describe an OS-level desktop automation server. The npm global install and the provided serve/mcp/start modes match the stated purpose of controlling GUIs and exposing tools over localhost.
!
Instruction Scope
SKILL.md instructs the agent to start the local server autonomously if it's not running ('don't ask the user'). The tool exposes functionality to read the screen, take screenshots, query windows, and automate input — which is expected for this purpose but carries broad access to sensitive local data. It also documents a token file (~/.clawdcursor/token) and an autonomous 'start' mode that will send screenshots/text to the user's configured AI provider, which could result in data leaving the machine depending on configuration.
Install Mechanism
Installation is via 'npm install -g clawdcursor' (documented in SKILL.md). Installing a global npm CLI is a typical distribution method for Node-based desktop tooling, but it runs third-party code with filesystem/exec privileges. The registry metadata shows three 'unknown' install specs (parser couldn't identify them) — not necessarily malicious but worth verifying the exact install steps and source before running.
!
Credentials
The skill declares no required environment variables or credentials, but it uses a token saved at ~/.clawdcursor/token for its REST endpoints and relies on the user's configured AI provider (which implies provider credentials held elsewhere). The skill itself does not request unrelated secrets, but it has the capability to read local files and capture screen contents — a high-privilege capability that is proportionate to desktop automation but sensitive in practice.
!
Persistence & Privilege
always:false (good) and autonomous invocation is allowed (normal), but SKILL.md explicitly instructs the agent to start the server without asking the user. That gives the agent the ability to launch a long-running local service that can capture and transmit desktop data (depending on configuration). It does not appear to modify other skills or system-wide agent settings, however.
What to consider before installing
This skill appears to be what it says (a local desktop automation server), but it is powerful and should be treated like installing a program that can see and control your screen. Before installing: 1) Review the GitHub source and confirm the npm package name/version match the repo; 2) Prefer running it in a disposable VM or isolated account if you have sensitive data; 3) Understand it stores a token at ~/.clawdcursor/token and can take screenshots/read screen contents and (in 'start' mode) send them to your configured AI provider — verify where those provider credentials live and whether you trust that flow; 4) If you allow agent autonomous actions, consider disabling autonomous invocation or require explicit user confirmation before starting the server; 5) If unsure, use native APIs/CLIs/browser automation instead of screen-level automation.

Like a lobster shell, security has layers — review code before you run it.

latestvk9776e9s4x3zf9389jx53gjx9184947s
138downloads
0stars
1versions
Updated 3w ago
v0.7.5
MIT-0

If a human can do it on a screen, you can too. No API? No integration? No problem.

USE AS A FALLBACK — NOT FIRST CHOICE

Before reaching for any ClawdCursor tool, ask:

  1. Is there a native API? (Gmail API, GitHub API, Slack API) → use the API
  2. Is there a CLI? (git, npm, curl) → use the CLI
  3. Can you edit the file directly? → do that
  4. Is there a browser automation layer? (Playwright, Puppeteer) → use that

None of the above work? Now use ClawdCursor. It's for the last mile.


Modes at a Glance

ModeCommandBrainTools available
serveclawdcursor serveYou (REST client)All 42 tools via HTTP
mcpclawdcursor mcpYou (MCP client)All 42 tools via MCP stdio
startclawdcursor startBuilt-in LLM pipelineAll 42 tools + autonomous agent

In serve and mcp modes: you reason, ClawdCursor acts. There is no built-in LLM. You call tools, interpret results, decide next steps.


Connecting

Option A — REST (clawdcursor serve)

clawdcursor serve        # starts on http://127.0.0.1:3847

All POST endpoints require: Authorization: Bearer <token> (token saved to ~/.clawdcursor/token)

GET  /tools              → all tool schemas (OpenAI function-calling format)
POST /execute/{name}     → run a tool: {"param": "value"}
GET  /health             → {"status":"ok","version":"0.7.5"}
GET  /docs               → full documentation

Example:

POST /execute/get_windows     {}
POST /execute/mouse_click     {"x": 640, "y": 400}
POST /execute/type_text       {"text": "hello world"}

If the server isn't running, start it yourself — don't ask the user:

clawdcursor serve
# wait 2 seconds, then verify: GET /health

Option B — MCP (clawdcursor mcp)

{
  "mcpServers": {
    "clawdcursor": {
      "command": "clawdcursor",
      "args": ["mcp"]
    }
  }
}

Works with Claude Code, Cursor, Windsurf, Zed, or any MCP-compatible client. All 42 tools are exposed identically.

Option C — Autonomous agent (clawdcursor start)

POST /task    {"task": "Open Notepad and write Hello"}   → submit task
GET  /status  → {"status": "acting"} | "idle" | "waiting_confirm"
POST /confirm {"approved": true}                         → approve safety-gated action
POST /abort                                              → stop current task

Use delegate_to_agent tool to submit tasks from within MCP/REST sessions. Requires clawdcursor start running on port 3847.

Polling pattern:

POST /task  {"task": "...", "returnPartial": true}
→ poll GET /status every 2s:
    "acting"           → still running, keep polling
    "waiting_confirm"  → STOP. Ask user → POST /confirm {"approved": true}
    "idle"             → done, check GET /task-logs for result
→ if 60s+ with no progress: POST /abort, retry with simpler phrasing

returnPartial mode — send {"returnPartial": true} with POST /task: ClawdCursor skips Stage 3 (expensive vision) and returns control to you if Stage 2 fails:

{"partial": true, "stepsCompleted": [...], "context": "got stuck on dialog"}

You finish the task with MCP tools, then call POST /learn to save what worked.

POST /learn — adaptive learning: After completing a task with your own tool calls, teach ClawdCursor for next time:

POST /learn
{
  "processName": "EXCEL",
  "task": "create table with headers",
  "actions": [
    {"action": "key", "description": "Ctrl+Home to go to A1"},
    {"action": "type", "description": "Type header name"},
    {"action": "key", "description": "Tab to next column"}
  ],
  "shortcuts": {"next_cell": "Tab", "next_row": "Enter"},
  "tips": ["Use Tab between columns, Enter between rows"]
}

This enriches the app's guide JSON. Stage 2 reads it on the next run — no vision fallback needed.


The Universal Loop

Every GUI task follows the same pattern regardless of transport:

1. ORIENT  →  read_screen() or get_windows()          see what's open and focused
2. ACT     →  smart_click() / smart_type() / key_press()   do the thing
3. VERIFY  →  check return value → window state → text check → screenshot
4. REPEAT  →  until done

Verification (cheapest to most expensive)

  1. Tool return value — every tool reports success/failure. Check it first.
  2. Window stateget_active_window(), get_windows() — did a dialog appear? Did the title change?
  3. Text checkread_screen() or smart_read() — is the expected text visible?
  4. Screenshotdesktop_screenshot() — only when text methods fail. Costs the most.
  5. Negative check — look for error dialogs, wrong window, unchanged screen.

Always verify after: sends, saves, deletes, form submissions. Skip verification for: mid-sequence keystrokes, scrolling.


Tool Decision Trees

Perception — always start here

read_screen()          → FIRST. Accessibility tree: buttons, inputs, text, with coords.
                          Fast, structured, works on native apps.
ocr_read_screen()      → When a11y tree is empty (canvas UIs, image-based apps).
smart_read()           → Combines OCR + a11y. Good first call when unsure.
desktop_screenshot()   → LAST RESORT. Only when you need pixel-level visual detail.
desktop_screenshot_region(x,y,w,h) → Zoomed crop when you need detail in one area.

Clicking

smart_click("Save")              → FIRST. Finds by label/text via OCR + a11y, clicks.
                                   Pass processId to target the right window.
invoke_element(name="Save")      → When you know the exact automation ID from read_screen.
cdp_click(text="Submit")         → Browser elements. Requires cdp_connect() first.
mouse_click(x, y)                → LAST RESORT. Raw coordinates from a screenshot.

Typing

smart_type("Email", "user@x.com")  → FIRST. Finds field by label, focuses, types.
cdp_type(label="Email", text="…")  → Browser inputs. Requires cdp_connect() first.
type_text("hello")                 → Clipboard paste into whatever is focused.
                                     Use after manually focusing with smart_click.

Browser / CDP

1. navigate_browser(url)     → opens URL, auto-enables CDP
2. cdp_connect()             → connect to browser DevTools Protocol
3. cdp_page_context()        → list interactive elements on page
4. cdp_read_text()           → extract DOM text (returns empty on canvas apps → use OCR)
5. cdp_click(text="…")       → click by visible text
6. cdp_type(label, text)     → fill input by label
7. cdp_evaluate(script)      → run JavaScript in page context
8. cdp_scroll(direction, px) → scroll page via DOM (not mouse wheel)
9. cdp_list_tabs()           → list all open tabs
10. cdp_switch_tab(target)   → switch to a specific tab

If CDP isn't connected, switch tabs with keyboard:

key_press("ctrl+1")          → tab 1
key_press("ctrl+tab")        → next tab
key_press("ctrl+shift+tab")  → previous tab

Window Management

get_windows()                         → list all open windows (use to find PIDs)
get_active_window()                   → what's in the foreground right now
focus_window(processName="Discord")   → bring to front (auto-minimizes phantom off-screen windows)
minimize_window(processName="calc")   → minimize a window — 1 call, cross-platform
                                        also accepts: processId, title

Rule: Always focus_window() before key_press() or type_text(). Keystrokes go to whatever has focus — if that's your terminal, not the target app.

Canvas apps (Google Docs, Figma, Notion)

DOM has no readable text. Pattern:

ocr_read_screen()          → read content (DOM extraction fails)
mouse_click(x, y)          → click into the canvas area
type_text("your text")     → clipboard paste works even on canvas

Quick Patterns

Open app and type:

open_app("notepad") → wait(2) → smart_read() → type_text("Hello") → smart_read()

Read a webpage:

navigate_browser(url) → wait(3) → cdp_connect() → cdp_read_text()

Fill a web form:

cdp_connect() → cdp_type("Email", "x@x.com") → cdp_type("Password", "…") → cdp_click("Submit")

Cross-app copy/paste:

focus_window("Chrome") → key_press("ctrl+a") → key_press("ctrl+c")
→ read_clipboard() → focus_window("Notepad") → type_text(clipboard)

Send email via Outlook:

open_app("outlook") → wait(2) → smart_click("New Email")
→ mouse_click(to_field_x, to_field_y) → type_text("recipient@x.com") → key_press("Tab")
→ mouse_click(subject_x, subject_y) → type_text("Subject") → key_press("Tab")
→ mouse_click(body_x, body_y) → type_text("Body text")
→ mouse_click(send_x, send_y)

Autonomous complex task (requires clawdcursor start):

delegate_to_agent("Open Gmail, find latest email from Stripe, forward to billing@x.com")
→ poll GET /status every 2s
→ if waiting_confirm: ask user → POST /confirm {"approved": true}
→ if idle: task done

Full Tool Reference (42 tools)

Speed: ⚡ Free/instant · 🔵 Cheap · 🟡 Moderate · 🔴 Vision (expensive)

Perception (6)

ToolWhat it doesWhen
read_screenA11y tree — buttons, inputs, text, coords⚡ Default first read
smart_readOCR + a11y combined🔵 When unsure which to use
ocr_read_screenRaw OCR text with bounding boxes🔵 Canvas UIs, empty a11y trees
desktop_screenshotFull screen image (1280px wide)⚡ Last resort visual check
desktop_screenshot_regionZoomed crop of specific area⚡ Fine-grained visual detail
get_screen_sizeScreen dimensions and DPI⚡ Coordinate calculations

Mouse (7)

ToolWhat it doesWhen
smart_clickFind element by text/label, click🔵 First choice for clicking
mouse_clickLeft click at (x, y)⚡ Last resort
mouse_double_clickDouble click at (x, y)⚡ Open files, select words
mouse_right_clickRight click at (x, y)⚡ Context menus
mouse_hoverMove cursor without clicking⚡ Hover menus
mouse_scrollScroll at position (physical mouse wheel)⚡ Scroll content
mouse_dragDrag from start to end — accepts startX/startY/endX/endY or x1/y1/x2/y2⚡ Resize, select ranges

Keyboard (5)

ToolWhat it doesWhen
smart_typeFind input by label, focus it, type🔵 First choice for form fields
type_textClipboard paste into focused element⚡ After manually focusing
key_pressSend key combo (ctrl+s, Return, alt+tab)⚡ After focus_window
shortcuts_listList keyboard shortcuts for current app⚡ Before reaching for mouse
shortcuts_executeRun a named shortcut (fuzzy match)⚡ Save, copy, paste, undo

Window Management (5)

ToolWhat it doesWhen
get_windowsList all open windows with PIDs and bounds⚡ Situational awareness
get_active_windowCurrent foreground window⚡ Check current focus
get_focused_elementElement with keyboard focus⚡ Debug wrong-field typing
focus_windowBring window to front (auto-clears off-screen phantoms)⚡ Always before key_press
minimize_windowMinimize by processName, processId, or title⚡ Clear focus stealers

UI Elements (2)

ToolWhat it doesWhen
find_elementSearch UI tree by name or type⚡ Find automation IDs
invoke_elementInvoke element by automation ID or name⚡ When ID known from read_screen

Clipboard (2)

ToolWhat it doesWhen
read_clipboardRead clipboard text⚡ After copy operations
write_clipboardWrite text to clipboard⚡ Before paste operations

Browser / CDP (11)

ToolWhat it doesWhen
cdp_connectConnect to browser DevTools Protocol⚡ First step for any browser task
cdp_page_contextList interactive elements on page⚡ After connect
cdp_read_textExtract DOM text⚡ Read page content
cdp_clickClick by CSS selector or visible text⚡ Browser clicks
cdp_typeType into input by label or selector⚡ Browser form filling
cdp_select_optionSelect dropdown option⚡ Select elements
cdp_evaluateRun JavaScript in page context⚡ Custom queries
cdp_scrollScroll page via DOM (direction, amount px)⚡ DOM-level scroll
cdp_wait_for_selectorWait for element to appear⚡ After navigation/AJAX
cdp_list_tabsList all browser tabs⚡ When on wrong tab
cdp_switch_tabSwitch to a tab by title or index⚡ After cdp_list_tabs

Orchestration (4)

ToolWhat it doesWhen
open_appLaunch application by name⚡ First step for desktop tasks
navigate_browserOpen URL (auto-enables CDP)⚡ First step for browser tasks
waitPause N seconds⚡ After opening apps, let UI render
delegate_to_agentSend task to built-in autonomous agent🟡 Complex multi-step tasks (requires clawdcursor start)

Provider Setup (agent mode only)

ProviderSetupCost
Ollama (local)ollama pull qwen2.5:7b && ollama serve$0 — fully offline, no data leaves machine
Any cloudSet env var: ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, MOONSHOT_API_KEY, etc.Varies
OpenClaw usersAuto-detected from ~/.openclaw/agents/main/auth-profiles.jsonNo extra setup

Run clawdcursor doctor to auto-detect and validate providers.


Security

  • Network isolation: Binds to 127.0.0.1 only. Verify: netstat -an | findstr 3847 — should show 127.0.0.1:3847, never 0.0.0.0:3847
  • Ollama: 100% offline. Screenshots stay in RAM, never leave the machine.
  • Cloud providers: Screenshots/text sent only to your configured provider. No telemetry, no analytics, no third-party logging.
  • Token auth: All mutating POST endpoints require Authorization: Bearer <token>. Token at ~/.clawdcursor/token.
  • Safety tiers: Auto / Preview / Confirm. Agents must never self-approve Confirm actions.

Coordinate System

All mouse tools use image-space coordinates from a 1280px-wide viewport — matching screenshots from desktop_screenshot. DPI scaling is handled automatically. Do not pre-scale coordinates.


Safety

TierActionsBehavior
🟢 AutoNavigation, reading, opening appsRuns immediately
🟡 PreviewTyping, form fillingLogged
🔴 ConfirmSend, delete, purchasePauses — always ask user first
  • Never self-approve Confirm actions.
  • Alt+F4 and Ctrl+Alt+Delete are blocked.
  • Server binds to 127.0.0.1 only.
  • First run requires explicit user consent for desktop control.

Error Recovery

ProblemFix
Port 3847 not respondingclawdcursor serve — wait 2s — GET /health
401 UnauthorizedToken changed — read ~/.clawdcursor/token and use fresh value
CDP not availableChrome must be open. navigate_browser(url) auto-enables it.
CDP on wrong tabcdp_list_tabs()cdp_switch_tab(target)
focus_window failsget_windows() to confirm title/processName, then retry
smart_click can't find elementread_screen() for coords → mouse_click(x, y)
key_press goes to wrong windowYou skipped focus_window — always focus first
cdp_read_text returns emptyCanvas app — use ocr_read_screen() instead
Same action fails 3+ timesTry a completely different approach

Platform Support

PlatformA11yOCRCDP
Windows (x64/ARM64)PowerShell + .NET UIAWindows.Media.OcrChrome/Edge
macOS (Intel/Apple Silicon)JXA + System EventsApple VisionChrome/Edge
Linux (x64/ARM64)AT-SPITesseractChrome/Edge

macOS: Grant Accessibility in System Settings → Privacy → Accessibility. Linux: sudo apt install tesseract-ocr for OCR support.

Comments

Loading comments...