Install
openclaw skills install ghosthand-skillUse this skill when operating Ghosthand, a local Android control runtime exposed over a loopback HTTP API for OpenClaw or another agent. Trigger it for Ghosthand tasks involving runtime or capability checks, structured UI inspection, selector planning, semantic clicks, coordinate taps, text input, scrolling, wait conditions, clipboard transfer, notifications, screenshots, or debugging Ghosthand-specific route behavior such as partial-output warnings, snapshot-scoped node IDs, or text vs content description vs resource-id selection.
openclaw skills install ghosthand-skillGhosthand is a loopback HTTP server on the Android phone. All interaction is via HTTP GET, POST, and a small amount of DELETE to http://127.0.0.1:5583.
Always do this first:
| Step | Command | Purpose |
|---|---|---|
| 1 | GET /ping | Is Ghosthand alive? |
| 2 | GET /state | Is the runtime healthy, and is the capability you need usable now? |
| 3 | GET /screen?source=accessibility | What is the current actionable surface? |
Use this skill to operate Ghosthand as an Android agent substrate.
Ghosthand is not generic Android advice. It is a local runtime with a route-based control surface. Use this skill only when the task is actually about Ghosthand routes, Ghosthand capability state, or acting through Ghosthand.
Ghosthand exposes a local HTTP API for Android observation and control. The important categories are:
/ping, /health, /state, /device, /foreground, /commands, /capabilities/screen, /tree, /focused, /find/click, /tap, /input, /type, /setText, /scroll, /swipe, /longpress, /gesture/back, /home, /recents/screenshot, /wait, /clipboard, /notifyTreat /commands as the current machine-readable capability catalog when route details matter.
Use it when the task requires any of the following:
text, desc, or idDo not use it for:
/commands can answer directlyBefore acting, establish three things:
Typical order:
/ping/state/commands if route shape, selector support, or response fields are uncertain/screen?source=accessibility for the current actionable surface/screen?source=hybrid or /screen?source=ocr/find, /click, or /tapA capability is usable only when both are true:
Do not confuse “permission granted” with “usable now”. Read /state before diagnosing failures, especially for accessibility and screenshot capture.
/state is the best live summary. /capabilities is the fuller catalog-style view when an agent needs route-capability mapping and availability details.
Treat nodeId as ephemeral. Do not cache it across fresh observations unless the snapshot context is clearly the same. Prefer re-resolving via /screen, /find, or selector-based /click instead of assuming old node IDs remain valid.
/screenUse /screen first when you need a compact actionable view. The default mode is source=accessibility.
Use it to answer:
/tapImportant details:
source=accessibility is the default and supports editable, scrollable, clickable, and package filterssource=hybrid or source=ocr is useful when accessibility is temporarily unavailable or operationally insufficientsummaryOnly=true is for compact orientation, not detailed targetingpreviewPath is a hint that a lightweight screenshot fetch is available; /screen does not embed image bytesIf /screen reports partialOutput=true, warnings, foreground drift, or fallback hints, do not assume you saw the whole surface. Escalate to /tree, /screenshot, or a non-accessibility screen mode before blaming the app.
/treeUse /tree when you need fuller structure, raw hierarchy, or to inspect why /screen may have omitted or shaped output. Use it for diagnosis and structural truth, not as your default first read.
/findUse /find when you already have a selector hypothesis and want a bounded lookup.
Prefer it when you need:
indextext, contentDesc, resourceId, or only as a focused nodeA miss usually means one of four things:
Supported strategies are text, textContains, contentDesc, contentDescContains, resourceId, and focused. text, desc, and id are convenience aliases in the request body; Ghosthand normalizes them internally.
/clickPrefer /click over /tap when you have a plausible semantic target. Ghosthand can resolve wrapper targets, bounded selector fallbacks, and clickable ancestors, then expose how it actually landed on an actionable node.
Use /click first for:
For selector-based /click, Ghosthand treats clickable=true as the default unless you explicitly set clickable=false. That default is optimized for action, not inspection. Use /find or disable clickable resolution when you need to inspect the raw matched node.
/tapUse /tap only when coordinates come from the current trusted surface. Do not guess coordinates. Coordinate fallback is justified only after semantic targeting has narrowed the uncertainty.
/input and /setTextUse /input for the focused editable field. Prefer it over /type when you need explicit text mutation or Enter dispatch semantics.
Use /type only for simpler focused text entry when the current focus is already correct.
Use /setText only when you have a trusted same-snapshot editable nodeId and need to target that exact node.
When entering text, do not assume the Enter key will successfully submit or confirm the input. If Enter does not work or the field remains uncommitted, use the on-screen IME confirmation action instead, typically the confirm button in the bottom-right corner of the keyboard.
/scroll and /swipeUse /scroll when the goal is container movement or list advancement.
Use /swipe when the task is truly geometric.
Do not interpret performed=true as proof that content changed. Check returned change fields, then verify with /screen, /tree, or /wait.
/waitUse /wait after actions that may change UI state.
There are two different uses:
GET /wait: wait for UI change and inspect final settled statePOST /wait: wait for a selector conditionDo not confuse changed=false with action failure. It only means a transition was not observed during the wait window. Re-check the final surface before concluding the action failed.
For POST /wait, the supported strategies are bounded and query rules matter: focused takes no query, while text/content-description/resource-id waits require one.
/clipboard, /notify, /screenshotUse /clipboard as a transport primitive for long text or repeated entry.
Use /notify to read or post local notifications only when the task is explicitly notification-related.
Use /screenshot when visual truth is needed and structured UI output is insufficient, ambiguous, or suspected stale.
Important details:
/screenshot supports GET and POST/screen publishes previewPath, use that exact path before inventing a new screenshot sizeSelectors are not interchangeable.
textUse text when the visible label is likely the actual text field of the node.
descUse desc when the control is icon-like, accessibility-labeled, nav-like, or visibly sparse. Many controls that look label-based are actually better matched through content description.
idUse id when a meaningful resourceId is present. This is often the strongest selector.
Do not over-read exact-match misses.
If the visible phrase may be part of a longer text block, retry with a contains-style strategy where the route supports it. A visible phrase on screen is not proof that exact text lookup should succeed.
/find supports explicit contains strategies. /click can use bounded contains fallback internally and tells you when it did so; do not mistake that for an exact selector hit.
When a Ghosthand action misses, do not branch into random retries. Make one bounded correction:
/screen/screen?source=hybrid or /screen?source=ocrtext to desc or id/click to /tap only after trustworthy coordinates exist/capabilities when the route exists but capability availability is ambiguous/wait to settle state before the next actionRepeated misses should be classified, not brute-forced.
/ping/state/capabilities/commands/screen?source=accessibilitytext, desc, or id/click/wait or re-read /screen/screen?source=hybrid or /screenshot/tap if semantic action remains weak but coordinates are trusted/input for the focused field, /type for simple focused typing, or /setText for a trusted same-snapshot editable nodeId/wait or re-read /screen to confirm the post-input state/state/screen?source=accessibility/screen?source=hybrid, /tree, or /screenshot if accessibility output is partial, unavailable, or misleadingWhen summarizing a Ghosthand run, report only:
Do not dump logs unless the task is explicitly diagnostic.
Detailed route notes are in resources/references/ghosthand-api-quick-reference.md.