Install
openclaw skills install claw-use-androidControl and interact with real Android phones via HTTP and CLI without ADB or root, supporting screen reading, taps, typing, apps, calls, and voice.
openclaw skills install claw-use-androidGive your AI agent eyes, hands, and a voice on a real Android phone.
claw-use-android is an Android app + CLI (cua) that exposes HTTP endpoints for full phone control. No ADB, no root, no PC.
# Install the APK on your Android phone, enable Accessibility Service
# Then register the device:
cua add redmi 192.168.0.105 <token>
cua ping
Three new endpoints replace the scattered old endpoints for AI agent workflows:
Returns elements with stable integer ref IDs, semantic zone, and role annotations.
cua screen # full semantic UI tree (JSON)
cua screen -c # compact: only interactive/text elements
Response:
{
"package": "com.android.settings",
"elements": [
{"ref": 1, "text": "设置", "zone": "header"},
{"ref": 2, "text": "搜索", "zone": "header", "role": "button", "click": true},
{"ref": 3, "text": "WLAN", "zone": "content"}
]
}
Returns a base64-encoded JPEG screenshot.
cua snapshot # save screenshot, print path
cua snapshot 50 720 out.jpg # quality, maxWidth, output
All operations through a single entry point, using ref IDs from /screen.
cua act '{"click": 3}' # click ref 3
cua act '{"click": "OK"}' # click by text (fallback)
cua act '{"click": [1, 2, 3]}' # click refs in sequence
cua act '{"tap": {"x": 540, "y": 960}}'
cua act '{"type": "hello"}' # type into focused field
cua act '{"type": {"ref": 3, "text": "hello"}}' # focus ref then type
cua act '{"swipe": "up"}' # directional swipe
cua act '{"scroll": "down"}' # scroll nearest scrollable
cua act '{"back": true}'
cua act '{"home": true}'
cua act '{"recents": true}'
cua act '{"longpress": 3}' # long press ref
cua act '{"launch": "com.duolingo"}'
# Multiple actions in one request:
cua act '{"home": true, "back": true}'
# 1. Observe
cua screen -c # get refs
# 2. Act
cua act '{"click": 5}' # click ref 5
# 3. Observe again
cua screen -c # see result
执行手机操作前,先读 flows.md(与本文件同目录)。
/flow 或批量脚本执行,跳过逐步推理{"screen":true} 断点 → 在该步读屏后由 agent 决策,然后继续flows.md主动沉淀(必须执行): 完成任何多步操作后,立即审视刚才的步骤序列。如果发现可复用的模式(哪怕只是部分步骤),当场追加到 flows.md。不要等用户提醒。沉淀是 agent 的责任,不是用户的。
这样做的好处:
/flow 在设备端 100ms 轮询执行,不经过 LLMcua)All legacy endpoints remain supported alongside the new unified API.
cua add <name> <ip> <token> # register device with alias
cua devices # list all (with live status)
cua use <name> # switch default device
cua rm <name> # remove device
cua -d <name> <command> # target specific device
cua discover # scan LAN for devices (192.168.x.x:7333)
cua screen # full UI tree (JSON)
cua screen -c # compact: only interactive/text elements
cua screenshot # save screenshot, print path
cua screenshot 50 720 out.jpg # quality, maxWidth, output
cua notifications # list all notifications
cua status # health dashboard
cua info # device model, screen size, permissions
cua tap <x> <y> # tap coordinates
cua click <text> # tap element by visible text
cua longpress <x> <y> # long press
cua swipe up|down|left|right
cua scroll up|down|left|right
cua type "text" # type text (CJK supported)
cua back # system back
cua home # go home
cua launch <package> # launch app
cua launch # list all apps
cua open <url> # open URL
cua call <number> # phone call
cua intent '<json>' # fire Android Intent
cua tts "hello" # speak through phone speaker
cua say "你好" # alias
cua clipboard # read clipboard
cua clipboard "text" # write to clipboard
cua camera [front|back] [quality] [output.jpg] # take photo
cua volume # read all volumes
cua volume media 10 # set media volume
cua volume media up # adjust volume
cua battery # battery status
cua wifi # WiFi info
cua location # GPS/network location
cua vibrate [ms] # vibrate (default 200ms)
cua contacts [search] # list/search contacts
cua sms list [limit] # read SMS
cua sms send <number> <message> # send SMS
cua file list [path] # list directory
cua file read <path> # read file
cua file write <path> <content> # write file
cua file delete <path> # delete file
cua wake # wake screen
cua lock / cua unlock # lock/unlock (PIN required)
cua config pin 123456 # remember lock screen PIN for auto-unlock
cua config pattern 256398 # EXPERIMENTAL: pattern unlock (not yet verified)
cua flow '{
"steps": [
{"wait": "继续安装", "then": "tap", "timeout": 10000},
{"wait": "继续更新", "then": "tap", "timeout": 10000},
{"wait": "完成", "then": "tap", "timeout": 60000, "optional": true}
]
}'
Flow runs entirely on the phone with zero LLM calls. The device polls its accessibility tree at 100ms intervals and reacts instantly when the target element appears.
Step fields:
wait — text to find (case-insensitive partial match)waitId — resource ID to findwaitDesc — content description to findwaitGone — wait for text to DISAPPEARthen — action: tap, click, longpress, back, home, nonetimeout — per-step timeout in ms (default 10000)optional — if true, timeout doesn't fail the flowpauseMs — pause after action before next step (default 500)# Atomic find-and-tap: retries until element appears
curl -X POST /click -d '{"text":"继续安装","retry":3,"retryMs":2000}'
Complete recipe for adding a new Android device from zero to fully operational.
# Scan LAN for devices
cua discover
# Register with a friendly name
cua add <name> <ip> <token>
# Verify connectivity
cua -d <name> ping
cua -d <name> info
# PIN unlock (recommended — proven reliable via a11y button tapping)
cua -d <name> config pin <PIN>
# Verify: lock then unlock
cua -d <name> lock
sleep 3
cua -d <name> unlock
# Should show {"unlocked":true}
Important: Only PIN unlock is verified to work. Pattern unlock is experimental and unreliable — the accessibility gesture dispatch doesn't consistently hit the correct grid coordinates across different devices and screen sizes. If the device uses pattern lock, change it to PIN.
cua -d <name> setup-perms
This automates granting all 9 app permissions on MIUI devices: 位置, 相机, 麦克风, 照片和视频, 音乐和音频, 短信, 电话, 联系人, 日历
The command navigates through Settings → Apps → Claw Use → Permissions and clicks through each permission grant dialog.
If setup-perms fails (common on tablets with dual-pane layout), grant manually:
These settings prevent MIUI from killing the service:
# Navigate to app settings
cua -d <name> intent '{"action":"android.settings.APPLICATION_DETAILS_SETTINGS","uri":"package:com.clawuse.android"}'
Then via a11y or manually ensure:
cua -d <name> status # check a11y health, uptime, request count
cua -d <name> screen -c # verify a11y tree works
cua -d <name> screenshot 50 720 /tmp/verify.jpg # verify screenshot
# Test auto-unlock end-to-end
cua -d <name> lock
sleep 3
cua -d <name> screen -c # should auto-unlock then return tree
MIUI Tablets (Xiaomi Pad 5, etc.):
APPLICATION_DETAILS_SETTINGS intent opens app LIST, not specific appsetup-perms may need manual fallback for tablet layoutMIUI Phones (Redmi K60 Ultra, etc.):
General Android:
takeScreenshot() returns black image on lock screen (Android security)flagRetrieveInteractiveWindows (added in v1.6.2)Update a device to a new APK version without ADB:
# Serve APK on LAN (from the machine with the APK)
cd /path/to/apk && python3 -m http.server 9090 &
# On the device, open browser to download
cua -d <name> intent '{"action":"android.intent.action.VIEW","uri":"http://<lan-ip>:9090/app.apk"}'
# Or via browser navigation for MIUI browser:
cua -d <name> click "浏览器"
cua -d <name> click "搜索或输入网址"
cua -d <name> type "http://<lan-ip>:9090/app.apk"
# ... then handle download + install prompts
# MIUI install flow (after APK opens in installer)
cua -d <name> flow '{
"steps": [
{"wait": "继续安装", "then": "tap", "timeout": 15000},
{"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true},
{"wait": "继续更新", "then": "tap", "timeout": 15000}
]
}'
# Verify new version after service restart (~30s)
sleep 30
cua -d <name> ping
UpdateReceiver: The app listens for MY_PACKAGE_REPLACED broadcast and auto-restarts the service after update. No manual intervention needed after install completes.
cua act '{"launch": "org.telegram.messenger"}'
cua screen -c
cua act '{"click": "Search Chats"}'
cua act '{"type": "John"}'
cua act '{"click": "John"}'
cua launch org.telegram.messenger
cua screen -c
cua click "Search Chats"
cua type "John"
cua click "John"
cua screen -c # what elements exist (structured, with refs)
cua snapshot 50 720 /tmp/look.jpg # what it looks like (visual)
Prefer screen -c over snapshot for decision-making. Structured a11y data is faster to process, has exact coordinates, and provides ref IDs for /act. Use snapshot only when visual context matters (images, colors, layout).
Automatic — any command auto-unlocks if PIN is configured. No special handling needed.
cua flow '{
"steps": [
{"wait": "继续安装", "then": "tap", "timeout": 15000},
{"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true},
{"wait": "继续更新", "then": "tap", "timeout": 10000}
]
}'
cua add phone1 192.168.0.101 <token>
cua add tablet 192.168.0.102 <token>
cua -d phone1 say "hello from phone 1"
cua -d tablet screenshot
click by text instead of tap by coordinates whenever text is visiblescreen -c as the primary perception tool — compact filters noise/flow for multi-step mechanical sequences — saves tokens, 100x faster than LLM-per-stepintent deep links for app navigation (e.g., https://t.me/c/{id}/{topic}/{msg})screenshot?maxWidth=720 is scaled, screen bounds are actual pixelstap when click can work — text-based is resolution-independent┌─────────────────────────────────────────────┐
│ Android Device │
│ │
│ :http process main process │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ BridgeService│ HTTP │ AccessibilityBridge│ │
│ │ NanoHTTPD │─────→│ A11yInternalServer│ │
│ │ 0.0.0.0:7333│proxy │ 127.0.0.1:7334 │ │
│ └──────────────┘ └──────────────────┘ │
│ ↑ auth+CORS ↑ a11y service │
│ ↑ auto-unlock ↑ gesture dispatch │
│ ↑ config/status ↑ tree traversal │
└────────────────────────────────────────────── ┘
↑ HTTP
┌────────────┐
│ Agent/CLI │ cua commands / curl
└────────────┘
| Platform | Package | CLI | Status |
|---|---|---|---|
| Android | claw-use-android | cua | ✅ Available |
| iOS | claw-use-ios | cui | 🔮 Planned |
| Windows | claw-use-windows | cuw | 🔮 Planned |
| Linux | claw-use-linux | cul | 🔮 Planned |
| macOS | claw-use-mac | cum | 🔮 Planned |