{"skill":{"slug":"claw-use-android","displayName":"Claw Use Android","summary":"Control and interact with real Android phones via HTTP and CLI without ADB or root, supporting screen reading, taps, typing, apps, calls, and voice.","description":"# Claw Use Android — Phone Control for AI Agents\n\nGive your AI agent eyes, hands, and a voice on a real Android phone.\n\n`claw-use-android` is an Android app + CLI (`cua`) that exposes HTTP endpoints for full phone control. No ADB, no root, no PC.\n\n## Setup\n\n```bash\n# Install the APK on your Android phone, enable Accessibility Service\n# Then register the device:\ncua add redmi 192.168.0.105 <token>\ncua ping\n```\n\n## New in v2.0.0: Unified API\n\nThree new endpoints replace the scattered old endpoints for AI agent workflows:\n\n### GET /screen — Semantic UI Tree\nReturns elements with stable integer `ref` IDs, semantic `zone`, and `role` annotations.\n\n```bash\ncua screen              # full semantic UI tree (JSON)\ncua screen -c           # compact: only interactive/text elements\n```\n\nResponse:\n```json\n{\n  \"package\": \"com.android.settings\",\n  \"elements\": [\n    {\"ref\": 1, \"text\": \"设置\", \"zone\": \"header\"},\n    {\"ref\": 2, \"text\": \"搜索\", \"zone\": \"header\", \"role\": \"button\", \"click\": true},\n    {\"ref\": 3, \"text\": \"WLAN\", \"zone\": \"content\"}\n  ]\n}\n```\n\n### GET /snapshot — JPEG Screenshot\nReturns a base64-encoded JPEG screenshot.\n\n```bash\ncua snapshot              # save screenshot, print path\ncua snapshot 50 720 out.jpg  # quality, maxWidth, output\n```\n\n### POST /act — Unified Action Endpoint\nAll operations through a single entry point, using `ref` IDs from `/screen`.\n\n```bash\ncua act '{\"click\": 3}'              # click ref 3\ncua act '{\"click\": \"OK\"}'           # click by text (fallback)\ncua act '{\"click\": [1, 2, 3]}'      # click refs in sequence\ncua act '{\"tap\": {\"x\": 540, \"y\": 960}}'\ncua act '{\"type\": \"hello\"}'          # type into focused field\ncua act '{\"type\": {\"ref\": 3, \"text\": \"hello\"}}'  # focus ref then type\ncua act '{\"swipe\": \"up\"}'            # directional swipe\ncua act '{\"scroll\": \"down\"}'         # scroll nearest scrollable\ncua act '{\"back\": true}'\ncua act '{\"home\": true}'\ncua act '{\"recents\": true}'\ncua act '{\"longpress\": 3}'           # long press ref\ncua act '{\"launch\": \"com.duolingo\"}'\n\n# Multiple actions in one request:\ncua act '{\"home\": true, \"back\": true}'\n```\n\n### Agent Workflow Pattern (screen → act loop)\n```bash\n# 1. Observe\ncua screen -c          # get refs\n# 2. Act\ncua act '{\"click\": 5}' # click ref 5\n# 3. Observe again\ncua screen -c          # see result\n```\n\n### Flow-First Principle\n\n**执行手机操作前，先读 `flows.md`（与本文件同目录）。**\n\n- 如果有匹配的 flow → 直接用 `/flow` 或批量脚本执行，跳过逐步推理\n- 如果 flow 中有 `{\"screen\":true}` 断点 → 在该步读屏后由 agent 决策，然后继续\n- 如果没有匹配 flow → 走 screen→act 循环，完成后**沉淀新 flow 到 `flows.md`**\n- 如果 flow 执行失败（超时、元素未找到等）→ **回退到 screen→act 循环**继续完成任务，事后修正 flows.md\n\n**主动沉淀（必须执行）：** 完成任何多步操作后，立即审视刚才的步骤序列。如果发现可复用的模式（哪怕只是部分步骤），当场追加到 `flows.md`。不要等用户提醒。沉淀是 agent 的责任，不是用户的。\n\n这样做的好处：\n1. **快**：`/flow` 在设备端 100ms 轮询执行，不经过 LLM\n2. **省 token**：一个 flow 替代 5-10 轮 agent 推理\n3. **可积累**：每次新场景都沉淀，agent 越用越快\n\n## Legacy CLI Reference (`cua`)\n\nAll legacy endpoints remain supported alongside the new unified API.\n\n### Device Management\n```bash\ncua add <name> <ip> <token>    # register device with alias\ncua devices                     # list all (with live status)\ncua use <name>                  # switch default device\ncua rm <name>                   # remove device\ncua -d <name> <command>         # target specific device\ncua discover                    # scan LAN for devices (192.168.x.x:7333)\n```\n\n### Perception — read the phone\n```bash\ncua screen              # full UI tree (JSON)\ncua screen -c           # compact: only interactive/text elements\ncua screenshot          # save screenshot, print path\ncua screenshot 50 720 out.jpg  # quality, maxWidth, output\ncua notifications       # list all notifications\ncua status              # health dashboard\ncua info                # device model, screen size, permissions\n```\n\n### Action — control the phone\n```bash\ncua tap <x> <y>         # tap coordinates\ncua click <text>        # tap element by visible text\ncua longpress <x> <y>   # long press\ncua swipe up|down|left|right\ncua scroll up|down|left|right\ncua type \"text\"         # type text (CJK supported)\ncua back                # system back\ncua home                # go home\ncua launch <package>    # launch app\ncua launch              # list all apps\ncua open <url>          # open URL\ncua call <number>       # phone call\ncua intent '<json>'     # fire Android Intent\n```\n\n### Audio\n```bash\ncua tts \"hello\"         # speak through phone speaker\ncua say \"你好\"          # alias\n```\n\n### Device I/O (v1.7.0+)\n```bash\ncua clipboard           # read clipboard\ncua clipboard \"text\"    # write to clipboard\ncua camera [front|back] [quality] [output.jpg]  # take photo\ncua volume              # read all volumes\ncua volume media 10     # set media volume\ncua volume media up     # adjust volume\ncua battery             # battery status\ncua wifi                # WiFi info\ncua location            # GPS/network location\ncua vibrate [ms]        # vibrate (default 200ms)\ncua contacts [search]   # list/search contacts\ncua sms list [limit]    # read SMS\ncua sms send <number> <message>  # send SMS\ncua file list [path]    # list directory\ncua file read <path>    # read file\ncua file write <path> <content>  # write file\ncua file delete <path>  # delete file\n```\n\n### Device State\n```bash\ncua wake                # wake screen\ncua lock / cua unlock   # lock/unlock (PIN required)\ncua config pin 123456   # remember lock screen PIN for auto-unlock\ncua config pattern 256398  # EXPERIMENTAL: pattern unlock (not yet verified)\n```\n\n### Flow Engine — phone-side scripted automation\n```bash\ncua flow '{\n  \"steps\": [\n    {\"wait\": \"继续安装\", \"then\": \"tap\", \"timeout\": 10000},\n    {\"wait\": \"继续更新\", \"then\": \"tap\", \"timeout\": 10000},\n    {\"wait\": \"完成\",     \"then\": \"tap\", \"timeout\": 60000, \"optional\": true}\n  ]\n}'\n```\n\nFlow runs entirely on the phone with zero LLM calls. The device polls its accessibility tree at 100ms intervals and reacts instantly when the target element appears.\n\n**Step fields:**\n- `wait` — text to find (case-insensitive partial match)\n- `waitId` — resource ID to find\n- `waitDesc` — content description to find  \n- `waitGone` — wait for text to DISAPPEAR\n- `then` — action: `tap`, `click`, `longpress`, `back`, `home`, `none`\n- `timeout` — per-step timeout in ms (default 10000)\n- `optional` — if true, timeout doesn't fail the flow\n- `pauseMs` — pause after action before next step (default 500)\n\n### Click with Retry\n```bash\n# Atomic find-and-tap: retries until element appears\ncurl -X POST /click -d '{\"text\":\"继续安装\",\"retry\":3,\"retryMs\":2000}'\n```\n\n---\n\n## Device Onboarding (New Device Setup)\n\nComplete recipe for adding a new Android device from zero to fully operational.\n\n### Prerequisites (human must do once)\n1. Install APK on the device (download from GitHub Releases or LAN HTTP)\n2. Enable Accessibility Service: Settings → Accessibility → Claw Use → ON\n3. Note the auth token from the app notification or main screen\n\n### Step 1: Discover & Register\n```bash\n# Scan LAN for devices\ncua discover\n\n# Register with a friendly name\ncua add <name> <ip> <token>\n\n# Verify connectivity\ncua -d <name> ping\ncua -d <name> info\n```\n\n### Step 2: Configure Auto-Unlock\n```bash\n# PIN unlock (recommended — proven reliable via a11y button tapping)\ncua -d <name> config pin <PIN>\n\n# Verify: lock then unlock\ncua -d <name> lock\nsleep 3\ncua -d <name> unlock\n# Should show {\"unlocked\":true}\n```\n\n**Important**: Only PIN unlock is verified to work. Pattern unlock is experimental and unreliable — the accessibility gesture dispatch doesn't consistently hit the correct grid coordinates across different devices and screen sizes. If the device uses pattern lock, change it to PIN.\n\n### Step 3: MIUI/HyperOS Permissions (automated)\n```bash\ncua -d <name> setup-perms\n```\n\nThis automates granting all 9 app permissions on MIUI devices:\n位置, 相机, 麦克风, 照片和视频, 音乐和音频, 短信, 电话, 联系人, 日历\n\nThe command navigates through Settings → Apps → Claw Use → Permissions and clicks through each permission grant dialog.\n\n**If `setup-perms` fails** (common on tablets with dual-pane layout), grant manually:\n1. Open Settings → Apps → Manage Apps → search \"Claw Use\"\n2. Tap \"App permissions\" (应用权限)\n3. Enable each permission: prefer \"始终允许\" > \"仅在使用中允许\" > \"允许\"\n\n### Step 4: Background Survival (MIUI)\nThese settings prevent MIUI from killing the service:\n\n```bash\n# Navigate to app settings\ncua -d <name> intent '{\"action\":\"android.settings.APPLICATION_DETAILS_SETTINGS\",\"uri\":\"package:com.clawuse.android\"}'\n```\n\nThen via a11y or manually ensure:\n- **自启动 (Autostart)**: ON\n- **省电策略 (Battery saver)**: 无限制 (No restrictions)\n- **通知 (Notifications)**: 允许 (Allow)\n- **WLAN联网 (WiFi access)**: ON (if available)\n\n### Step 5: Verify Everything\n```bash\ncua -d <name> status    # check a11y health, uptime, request count\ncua -d <name> screen -c # verify a11y tree works\ncua -d <name> screenshot 50 720 /tmp/verify.jpg  # verify screenshot\n\n# Test auto-unlock end-to-end\ncua -d <name> lock\nsleep 3\ncua -d <name> screen -c  # should auto-unlock then return tree\n```\n\n### Known Device-Specific Issues\n\n**MIUI Tablets (Xiaomi Pad 5, etc.)**:\n- Settings uses dual-pane layout — left panel items NOT visible in a11y tree\n- Must navigate through full Settings → Apps path instead of direct Intent\n- `APPLICATION_DETAILS_SETTINGS` intent opens app LIST, not specific app\n- `setup-perms` may need manual fallback for tablet layout\n\n**MIUI Phones (Redmi K60 Ultra, etc.)**:\n- ICP 备案 dialog may appear during APK install — click \"继续安装\"\n- \"仍然下载\" confirmation in Chrome for HTTP downloads\n- Chrome downloads don't auto-open APK — go to Downloads → tap the file icon (left side)\n\n**General Android**:\n- Notification Listener requires manual enable: Settings → 通知 → 设备和应用通知 → Claw Use\n- `takeScreenshot()` returns black image on lock screen (Android security)\n- Lock screen a11y tree requires `flagRetrieveInteractiveWindows` (added in v1.6.2)\n\n---\n\n## Self-Update (OTA via LAN)\n\nUpdate a device to a new APK version without ADB:\n\n```bash\n# Serve APK on LAN (from the machine with the APK)\ncd /path/to/apk && python3 -m http.server 9090 &\n\n# On the device, open browser to download\ncua -d <name> intent '{\"action\":\"android.intent.action.VIEW\",\"uri\":\"http://<lan-ip>:9090/app.apk\"}'\n\n# Or via browser navigation for MIUI browser:\ncua -d <name> click \"浏览器\"\ncua -d <name> click \"搜索或输入网址\"\ncua -d <name> type \"http://<lan-ip>:9090/app.apk\"\n# ... then handle download + install prompts\n\n# MIUI install flow (after APK opens in installer)\ncua -d <name> flow '{\n  \"steps\": [\n    {\"wait\": \"继续安装\", \"then\": \"tap\", \"timeout\": 15000},\n    {\"wait\": \"已了解此应用未经安全检测\", \"then\": \"tap\", \"timeout\": 10000, \"optional\": true},\n    {\"wait\": \"继续更新\", \"then\": \"tap\", \"timeout\": 15000}\n  ]\n}'\n\n# Verify new version after service restart (~30s)\nsleep 30\ncua -d <name> ping\n```\n\n**UpdateReceiver**: The app listens for `MY_PACKAGE_REPLACED` broadcast and auto-restarts the service after update. No manual intervention needed after install completes.\n\n---\n\n## Workflow Patterns\n\n### Navigate and interact (v2.0+ recommended)\n```bash\ncua act '{\"launch\": \"org.telegram.messenger\"}'\ncua screen -c\ncua act '{\"click\": \"Search Chats\"}'\ncua act '{\"type\": \"John\"}'\ncua act '{\"click\": \"John\"}'\n```\n\n### Navigate and interact (legacy)\n```bash\ncua launch org.telegram.messenger\ncua screen -c\ncua click \"Search Chats\"\ncua type \"John\"\ncua click \"John\"\n```\n\n### Visual + semantic perception\n```bash\ncua screen -c                          # what elements exist (structured, with refs)\ncua snapshot 50 720 /tmp/look.jpg      # what it looks like (visual)\n```\n\n**Prefer `screen -c` over `snapshot`** for decision-making. Structured a11y data is faster to process, has exact coordinates, and provides ref IDs for `/act`. Use snapshot only when visual context matters (images, colors, layout).\n\n### Handle locked device\nAutomatic — any command auto-unlocks if PIN is configured. No special handling needed.\n\n### MIUI APK Install (via /flow)\n```bash\ncua flow '{\n  \"steps\": [\n    {\"wait\": \"继续安装\", \"then\": \"tap\", \"timeout\": 15000},\n    {\"wait\": \"已了解此应用未经安全检测\", \"then\": \"tap\", \"timeout\": 10000, \"optional\": true},\n    {\"wait\": \"继续更新\", \"then\": \"tap\", \"timeout\": 10000}\n  ]\n}'\n```\n\n### Multi-device\n```bash\ncua add phone1 192.168.0.101 <token>\ncua add tablet 192.168.0.102 <token>\ncua -d phone1 say \"hello from phone 1\"\ncua -d tablet screenshot\n```\n\n## Operational Lessons\n\n### DO\n- **Use `click` by text** instead of `tap` by coordinates whenever text is visible\n- **Use `screen -c`** as the primary perception tool — compact filters noise\n- **Use `/flow`** for multi-step mechanical sequences — saves tokens, 100x faster than LLM-per-step\n- **Use `intent` deep links** for app navigation (e.g., `https://t.me/c/{id}/{topic}/{msg}`)\n- **Use PIN unlock** — proven 100% reliable via a11y button tapping\n\n### DON'T\n- **Don't use screenshot coordinates for tapping** — `screenshot?maxWidth=720` is scaled, `screen` bounds are actual pixels\n- **Don't try pattern unlock** — coordinates vary by device/OS, no reliable way to locate the grid\n- **Don't rely on `tap` when `click` can work** — text-based is resolution-independent\n- **Don't manually navigate app UIs when deep links exist** — error-prone and slow\n- **Don't rapid-fire requests** — allow 0.5-1s between actions for UI to settle\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────┐\n│              Android Device                  │\n│                                              │\n│  :http process          main process         │\n│  ┌──────────────┐      ┌──────────────────┐ │\n│  │ BridgeService│ HTTP │ AccessibilityBridge│ │\n│  │ NanoHTTPD    │─────→│ A11yInternalServer│ │\n│  │ 0.0.0.0:7333│proxy │ 127.0.0.1:7334   │ │\n│  └──────────────┘      └──────────────────┘ │\n│    ↑ auth+CORS           ↑ a11y service      │\n│    ↑ auto-unlock         ↑ gesture dispatch  │\n│    ↑ config/status       ↑ tree traversal    │\n└────────────────────────────────────────────── ┘\n         ↑ HTTP\n    ┌────────────┐\n    │  Agent/CLI │  cua commands / curl\n    └────────────┘\n```\n\n## Family\n\n| Platform | Package | CLI | Status |\n|----------|---------|-----|--------|\n| Android | claw-use-android | `cua` | ✅ Available |\n| iOS | claw-use-ios | `cui` | 🔮 Planned |\n| Windows | claw-use-windows | `cuw` | 🔮 Planned |\n| Linux | claw-use-linux | `cul` | 🔮 Planned |\n| macOS | claw-use-mac | `cum` | 🔮 Planned |\n","tags":{"latest":"2.0.0"},"stats":{"comments":0,"downloads":924,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":2},"createdAt":1773642931993,"updatedAt":1779078304058},"latestVersion":{"version":"2.0.0","createdAt":1773842434127,"changelog":"v2.0.0: Unified /screen + /snapshot + /act API, flow-first agent pattern, flows.md knowledge base, device I/O (camera/clipboard/SMS/contacts/location), OTA self-update, multi-device support.","license":"MIT-0"},"metadata":null,"owner":{"handle":"4ier","userId":"s1764h4cgqpc90zs9r68tavz35842d1f","displayName":"傅洋","image":"https://avatars.githubusercontent.com/u/5648066?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1780089961130}}