Install
openclaw skills install flexarm-robot-armGive AI agents physical eyes and a hand to operate any smartphone. Built-in 50+ atomic actions (click, swipe, OCR, screenshot). AI can call them directly. Also supports YAML scripts. Script Store: once one AI learns an app, all AI agents instantly master it. For strong-reasoning AI models: they can autonomously explore new apps via camera + robot arm — see, reason, act, verify. This closed-loop lets AI use any phone function, almost infinitely.
openclaw skills install flexarm-robot-armAn AI Agent MUST use this skill when the user wants to physically interact with a real smartphone screen using a robot arm. This includes:
robot arm, FlexArm, calibration, camera view, physical click, or real phone.Example user utterances that should trigger this skill:
Do NOT use this skill if:
Phone screen automation via robot arm + camera vision. Uses a camera to detect the phone screen area, maps pixel coordinates to physical arm coordinates, and performs precise clicks and swipes.
All API calls in this skill depend on the RobotArmServer.exe service.
Before using this skill, the following conditions must be met:
Calibration Tool: RobotArmCalibration.exe
Download: Official Release Page
Latest version: v2.0.0
Size: ~160 MB (compressed)
Server Program: RobotArmServer.exe (included in RobotArmServer.zip)
Download: FlexArm v2.0.1 Release
Latest version: v2.0.0
Size: ~231 MB (compressed)
Installation:
RobotArmServer.zip from the link aboveD:\FlexArm) — this becomes the project rootRobotArmServer.exe as Administrator (first run requires admin privileges to install the driver)http://127.0.0.1:7826/api/health — should return {"ok":true}Directory Convention:
All relative paths (e.g., scripts/, icons/) are relative to the project root above.
Do not modify files inside the _internal/ directory.
⚠️ If the service is not running, this skill cannot perform any operations. Before starting a task, always check
/api/healthstatus.
All API requests must use port 7826, not 5000.
http://127.0.0.1:7826/api/*
The port is fixed at 7826 and cannot be changed. Flask's default port 5000 does not apply.
Do NOT use curl to send Chinese characters in JSON. curl corrupts UTF-8 encoding and the server won't correctly recognize Chinese keywords, causing lookup failures.
# ❌ Wrong: curl corrupts Chinese characters in JSON
curl -X POST http://127.0.0.1:7826/api/find_text -d '{"text_keyword":"领取"}'
# ✅ Correct: use Python requests for Chinese parameters
python -c "import requests; r = requests.post('http://127.0.0.1:7826/api/find_text', json={'text_keyword': '领取'}); print(r.text)"
APIs with English-only parameters (e.g., detect_desktop, click_icon, run_script, click_at) may use curl. APIs involving Chinese keywords (find_text, click_text, detect_page page names) must use Python.
All endpoints below are accessible via HTTP POST/GET at
http://127.0.0.1:7826.
| # | Method | Endpoint | Description |
|---|---|---|---|
| 1 | GET | / | Service root |
| 2 | GET | /api/health | Health check (service, arm, camera status) |
| 3 | GET | /api/arm_status | Arm status (COM port, service, calibration) |
| 4 | GET | /api/get_frame_info | Get frame dimensions |
| 5 | GET | /api/get_overlay | Get current overlay (vision match result) |
| 6 | GET | /api/get_phone_corners | Get phone screen 4-corner coordinates |
| 7 | GET | /api/is_phone_present | Check if phone is in frame |
| 8 | GET | /api/is_screen_on | Check if screen is on |
| # | Method | Endpoint | Description |
|---|---|---|---|
| 9 | GET/POST | /api/show_window | Open camera display window |
| 10 | GET/POST | /api/hide_window | Close camera display window |
| 11 | POST | /api/toggle_phone_corners | Toggle phone screen border overlay |
| 12 | POST | /api/change_focus | Adjust camera focus (delta value) |
| 13 | GET | /api/screenshots | List historical screenshot files |
| # | Method | Endpoint | Description |
|---|---|---|---|
| 14 | POST | /api/go_home | Home — return to desktop |
| 15 | POST | /api/go_back | Back navigation |
| 16 | POST | /api/go_forward | Forward navigation |
| 17 | POST | /api/reset | Reset robot arm to origin |
| 18 | POST | /api/clear_overlay | Clear vision match overlay boxes |
| 19 | GET/POST | /api/run_app | Launch a specified app |
| 20 | POST | /api/swipe_up | Swipe up (large/small) |
| 21 | POST | /api/swipe_down | Swipe down (large/small) |
| 22 | POST | /api/swipe_up_normal | Standard swipe up (~80% success) |
| 23 | POST | /api/swipe_down_normal | Standard swipe down |
| 24 | POST | /api/swipe | Custom swipe (start/end percentages) |
| 25 | POST | /api/close_all_apps | Close all background apps |
| 26 | POST | /api/click_icon | Template-matching icon click |
| 27 | POST | /api/click_icons | Click multiple icons sequentially |
| 28 | POST | /api/click_icon_many_times | Click same icon multiple times |
| 29 | POST | /api/click_text | OCR text search and click |
| 30 | POST | /api/click_at | Click at frame pixel coordinates |
| 31 | POST | /api/click | Click at phone percentage coordinates |
| 32 | POST | /api/click_roi | Click center of an ROI area |
| 33 | POST | /api/screenshot | Screenshot (save file / return base64) |
| 34 | POST | /api/reload_gestures | Reload gesture config (hot-reload) |
| # | Method | Endpoint | Description |
|---|---|---|---|
| 35 | POST | /api/find_template | Full-screen template matching |
| 36 | POST | /api/find_template_roi | ROI-based template matching |
| 37 | POST | /api/find_text | OCR text search |
| 38 | POST | /api/find_text_roi | ROI-based OCR text search |
| 39 | POST | /api/find_all_text | Recognize all text on screen |
| 40 | POST | /api/find_all_templates | All templates must match |
| 41 | POST | /api/find_any_template | Any template match is sufficient |
| 42 | POST | /api/count_template | Count template occurrences |
| 43 | POST | /api/detect_desktop | Detect current desktop page |
| 44 | POST | /api/detect_page | Detect current app page |
| 45 | POST | /api/wait_for_template | Poll until template appears |
| 46 | POST | /api/wait_for_page | Poll until target page appears |
| # | Method | Endpoint | Description |
|---|---|---|---|
| 47 | POST | /api/run_script | Execute YAML script (async) |
| 48 | GET | /api/script_status | Check if a script is running |
| 49 | GET | /api/script_progress | Get script execution progress |
| 50 | POST | /api/stop_script | Force-stop a running script |
| # | Method | Endpoint | Description |
|---|---|---|---|
| 51 | GET/PUT | /api/config/daily | Read/update daily automation config |
| 52 | GET/PUT | /api/config/app/<name> | Read/update app page config |
| 53 | GET/PUT | /api/config/gesture | Read/update gesture config |
| # | Method | Endpoint | Description |
|---|---|---|---|
| 54 | POST | /api/shutdown | Graceful service shutdown |
User / AI Agent
│
├── HTTP API (POST/GET http://127.0.0.1:7826/api/*)
│ Controls robot arm, camera, script execution
│
└── YAML Scripts (scripts/*.yaml)
Define automation workflows (click icons, find text, loops, conditions)
For AI Agents: You cannot see the camera feed directly. Use these APIs to understand the phone screen state:
GET /api/get_frame_info — frame metadataGET /api/is_phone_present — detect if phone is in frameGET /api/is_screen_on — detect if screen is litPOST /api/screenshot {"return_base64": true} — get base64 image dataPOST /api/detect_page — detect current page nameCore Principles:
/api/run_script, all API commands are synchronous and blocking. You must wait for the HTTP response (with "ok": true indicating completion) before sending the next command. run_script launches a background thread and returns immediately; monitor it with script_status, script_progress, stop_script| ✅ Safe to call while script runs | ❌ Rejected while script runs |
|---|---|
health, arm_status, script_status, script_progress | run_script (only one at a time) |
get_frame_info, get_overlay, get_phone_corners | click, click_icon, click_text, click_at, click_roi |
is_phone_present, is_screen_on | swipe, swipe_up, swipe_down, swipe_up_normal, swipe_down_normal |
find_template, find_all_templates, find_any_template, count_template (cv2, thread-safe) | go_home, go_back, go_forward |
screenshot (base64 or file) | find_text, find_all_text, find_text_roi (PaddleOCR singleton, not thread-safe) |
wait_for_template (uses cv2 internally) | detect_desktop, detect_page, wait_for_page (may call OCR) |
reset, clear_overlay, close_all_apps, run_app, reload_gestures |
# Start the service
RobotArmServer.exe
# Default port: 7826
On startup the program auto-detects and initializes:
Note: First-time use requires Administrator privileges to install the Windows service. Run
RobotArmServer.exeas Administrator. In daily use, administrator rights are not needed if the service is already installed.
If service installation fails, the program still starts but the arm is unavailable. You can manually run
robot-arm-service\安装.batas Administrator to install the service.
curl http://127.0.0.1:7826/api/health
# Returns: {"ok": true, "data": {"status": "running", ...}}
curl -X POST http://127.0.0.1:7826/api/show_window -H "Content-Type: application/json" -d '{}'
After opening the window, you should see the phone screen. Press ESC to close.
For AI Agents: Check screen state without seeing the window
AI Agents cannot see the window. Use these APIs instead:
# Get frame metadata curl http://127.0.0.1:7826/api/get_frame_info # Returns: {"ok":true,"data":{"width":960,"height":540}} # Detect if phone is in frame (brightness check) curl http://127.0.0.1:7826/api/is_phone_present # Returns: {"ok":true,"data":{"present":true}} # Detect if screen is on curl http://127.0.0.1:7826/api/is_screen_on # Returns: {"ok":true,"data":{"screen_on":true}} # Get current frame (base64, parseable by AI) curl -X POST http://127.0.0.1:7826/api/screenshot -H "Content-Type: application/json" -d '{"return_base64":true,"phone_only":true}' # Returns: {"ok":true,"data":{"base64":"iVBORw0KGgoAAAANSUhEUg..."}} # Save screenshot to file curl -X POST http://127.0.0.1:7826/api/screenshot -H "Content-Type: application/json" -d '{"filename":"screenshots/test.png"}' # Returns: {"ok":true,"data":{"path":"E:\\robot_arm\\screenshots\\test.png"}} # List historical screenshots curl http://127.0.0.1:7826/api/screenshots?limit=5 # Returns: [{"filename":"...","size":123456,"time":"2026-05-24 18:00:12"},...]
curl -X POST http://127.0.0.1:7826/api/click \
-H "Content-Type: application/json" \
-d '{"x": 0.5, "y": 0.5}'
The robot arm automatically moves to and clicks the center of the phone screen (x: 0.5, y: 0.5 are percentage coordinates, range 0-1).
No icon templates or configuration needed — just calibrate and go.
⚠️ Important: All YAML script files must use UTF-8 without BOM encoding. UTF-8 with BOM causes parse failures or garbled Chinese parameters.
Create scripts/hello_flexarm.yaml:
name: hello_flexarm
description: "First FlexArm script — experience clicking, swiping, waiting"
steps:
# 1. Click center of screen
- action: click
x: 0.5
y: 0.5
# 2. Wait 1 second
- action: wait
seconds: 1
# 3. Click bottom of screen (back navigation)
- action: click
x: 0.5
y: 0.95
# 4. Wait 2 seconds
- action: wait
seconds: 2
# 5. Large swipe up (page turn)
- action: swipe_up
large: true
# 6. Wait 1 second
- action: wait
seconds: 1
# 7. Small swipe down
- action: swipe_down
large: false
# 8. Click top-right corner
- action: click
x: 0.85
y: 0.1
Run it:
curl -X POST http://127.0.0.1:7826/api/run_script \
-H "Content-Type: application/json" \
-d '{"path": "scripts/hello_flexarm.yaml"}'
The example above only needs calibration — no icon templates or page config required.
To learn more advanced actions (icon clicks, OCR text clicks, page switching, conditional branches), continue reading to understand icon templates and page definitions.
scripts/configs/app_desktop.yamlRequired, filename is fixed as app_desktop.yaml. The program uses it to identify pages on the phone desktop.
app_name: Phone Desktop
pages:
- name: desktop_page0 # Page name, arbitrary string
min_match: 2 # At least 2 features must match
must_features: # Required (all must pass)
- name: Phone
type: image
path: icons/app_phone.png
mask: false # false=4-corner sampling, loose matching
- name: Camera
type: image
path: icons/app_camera.png
mask: false
features: # Optional (need min_match to pass)
- name: Messages
type: image
path: icons/app_message.png
mask: false
- name: Settings
type: image
path: icons/app_settings.png
mask: false
- name: desktop_page1
min_match: 3
must_features:
- name: Phone
type: image
path: icons/app_phone.png
mask: false
- name: Camera
type: image
path: icons/app_camera.png
mask: false
features:
- name: Qishui Music
type: image
path: icons/app_qishui.png
mask: false
- name: WeChat
type: image
path: icons/app_wechat.png
mask: false
- name: TaskSwitcher # Multi-task switching view
min_match: 0
must_features:
- name: RecentApps
type: image
path: icons/task_show.png
mask: false
- name: Trash
type: image
path: icons/task_delete.png
mask: false
features: []
Key Fields:
must_features: all must match, or the page is skippedfeatures: optional; passes if matched count >= min_matchmask: false: uses 4-corner background sampling, tolerates icon size/position variationmask: true (or omitted): strict template matching, suitable for fixed UI elementsscripts/configs/app_xxx.yamlOne config file per app, defining all recognizable pages within that app.
app_name: Qishui Music
pages:
- name: Music
min_match: 1
must_features: []
features:
- name: BottomPlayerBar
type: image
path: icons/qishui/music_playing.png
mask: false
- name: Rewards
min_match: 1
must_features:
- name: RewardsTitle
type: text
text: "福利"
features: []
When
switch_pagein a script doesn't match any page, it auto-executes thedefaultbranch — no separate config needed.
Or manually: take a phone screenshot → crop the icon → place it in the icons/ directory.
Icon Requirements:
app_qishui.pngname: script_name
description: "Script description"
steps:
- action: action_type
param1: value1
param2: value2
| action | Parameters | Description |
|---|---|---|
click_icon | path, threshold, roi, mask | Template-matching icon click |
click_icons | paths, interval | Click multiple icons sequentially (arm exits frame between clicks) |
click_icon_many_times | path, count, interval | Click same spot multiple times without reset |
dial_number | number, interval | Smart dialing (maps number to digit icons, supports # and *) |
click_text | text, roi, min_score | OCR text search and click |
click | x, y | Click phone percentage coordinates (0-1), ±30px random offset |
click_at | cam_x, cam_y | Click frame pixel coordinates (precise, no offset) |
click_roi | roi, label | Click center of ROI area (phone screen percentage) |
find_all_text | roi, min_score | Recognize all text, return list + positions + confidence |
swipe | sx, sy, ex, ey, steps, step_wait_ms | Custom swipe (start/end percentages) |
swipe_up | large: true/false | Swipe up (large/small) |
swipe_down | large: true/false | Swipe down (large/small) |
swipe_up_normal | none | Standard swipe up (~80% success) |
swipe_down_normal | none | Standard swipe down |
go_home | max_retries | Return to desktop (detect → swipe up → detect loop) |
go_back | none | Back navigation |
go_forward | none | Forward navigation |
reset | none | Reset robot arm |
clear_overlay | none | Clear vision overlay boxes |
run_app | app_name | Launch app (go_home → detect page → swipe → click icon) |
close_all_apps | max_swipes | Close all background apps |
screenshot | filename, phone_only, show_board, return_base64 | Screenshot |
reload_gesture | none | Reload gesture config (hot-reload) |
set_video_to_coin | value | Set video-to-coin earning mode |
wait | seconds | Wait (supports ranges like 2-5) |
loop | count, steps | Loop sub-steps (supports random ranges like count: 3-5) |
if_found | type, path/text, then, else | Conditional: if target found, run then; else run else |
if_found_roi | type, path/text, roi, then, else | Same as above but with ROI-limited search |
if_progress_stop | template, roi, then, else | Progress bar stall detection |
if_video_to_coin | then, else | Branch based on video-to-coin mode state |
if_random | chance, then, else | Random probability branch |
detect_desktop | config | Detect if currently on desktop (no assertion) |
detect_page | config | Detect current page name (no assertion) |
is_screen_on | none | Check if screen is lit |
assert_desktop | config | Must be on desktop, error if not |
switch_page | config, cases | Detect page → match cases → default if no match |
run_script | path | Execute sub-script (synchronous, returns on completion) |
stop_loop | none | Break current loop |
stop_script | none | Stop current script level (sub-script only stops itself) |
log | message | Print log message |
- action: click_icon
path: icons/app_qishui.png # Icon path (relative to project root)
threshold: 0.75 # Match threshold (default 0.75)
roi: [0.1, 0.2, 0.5, 0.6] # Search area [sx, sy, ex, ey] (phone percentage, 0-1)
mask: false # false=loose, true=strict (default true)
- action: click_text
text: "领取" # Text to find
roi: [0.3, 0.5, 0.7, 0.8] # Optional search area
min_score: 0.5 # OCR minimum confidence (default 0.3)
- action: click
x: 0.5 # X percentage (0=left, 1=right)
y: 0.96 # Y percentage (0=top, 1=bottom)
- action: loop
count: 10 # Fixed count
# count: 3-5 # Random range also supported
steps:
- action: click_text
text: "领取"
- action: wait
seconds: 2
- action: if_found
type: image/text # image=template matching, text=OCR
path: icons/qishui/cross.png # For type=image
text: "继续观看" # For type=text
roi: [0.7, 0.0, 1.0, 0.15] # Optional search area
then:
- action: click_icon
path: icons/qishui/cross.png
roi: [0.7, 0.0, 1.0, 0.15]
else:
- action: wait
seconds: 2
- action: if_random
chance: 0.4 # 40% chance to take then branch
then:
- action: log
message: "Took then branch"
else:
- action: log
message: "Took else branch"
- action: switch_page
config: scripts/configs/app_qishui.yaml # Page config file
cases:
Music: # When "Music" page matches
- action: click
x: 0.5
y: 0.96
Rewards:
- action: click_text
text: "福利"
default: # When no page matches
- action: swipe_up
large: true
steps:
- action: run_script
path: qishui/run_ad_card.yaml # Execute sub-script
- action: wait
seconds: 5
After a sub-script finishes, execution returns to the parent script.
- action: wait
seconds: 2-5 # Random wait between 2~5 seconds
loop repeats its sub-steps for the specified countswitch_page iterates through all page definitions until a match is foundrun_script is a sub-call — returns to the parent when donestop_loop breaks the current loopstop_script stops the current level (in a sub-script, only stops that sub-script)Basic Info:
http://127.0.0.1:7826{"ok": true, "data": {...}}{"ok": false, "error": "error message"}GET /api/health
Returns service status, port, uptime, etc.
GET /api/arm_status
Returns COM port, connection status, movement range, etc.
GET /api/get_frame_info
Returns:
{"ok": true, "data": {"width": 540, "height": 960, "fps": 29.5}}
POST /api/show_window
POST /api/hide_window
GET /api/is_phone_present
GET /api/is_phone_present?bright_threshold=60&bright_ratio=0.08
Returns:
{"ok": true, "data": {"present": true}}
GET /api/is_screen_on
GET /api/is_screen_on?dark_threshold=30&dark_ratio=0.7
Returns:
{"ok": true, "data": {"screen_on": true}}
POST /api/toggle_phone_corners
Overlays a green phone screen border on the display window.
POST /api/screenshot {"path": "screenshots/test.png"} # Save to file
POST /api/screenshot {"return_base64": true} # Return base64 (recommended for AI Agents)
POST /api/screenshot {"phone_only": true} # Crop to phone area only
POST /api/screenshot {"show_board": true} # Full view with ruler
GET /api/screenshots
GET /api/screenshots?limit=10
Returns:
{"ok":true,"data":[{"filename":"20260524_1800_phone.png","path":"E:\\robot_arm\\screenshots\\...","size":123456,"time":"2026-05-24 18:00:12"},...]}
POST /api/detect_desktop {"desktop_config": "scripts/configs/app_desktop.yaml"}
Returns:
{"ok": true, "data": {"matched": true, "page_name": "desktop_page1", "score": 0.84}}
POST /api/detect_page {"config_path": "scripts/configs/app_qishui.yaml", "threshold": 0.75}
Returns:
{"ok": true, "data": {"matched": true, "page_name": "Rewards", "score": 0.82}}
POST /api/find_template
{"path": "icons/app_qishui.png", "threshold": 0.75, "roi": [0.1, 0.2, 0.5, 0.6], "auto_mask": false}
Returns:
{"ok": true, "data": {"x": 257, "y": 453, "w": 52, "h": 53, "score": 0.9446}}
POST /api/find_text {"text_keyword": "领取", "roi": [0.3, 0.5, 0.7, 0.8], "min_score": 0.5}
Returns:
{"ok": true, "data": {"x": 300, "y": 600, "w": 40, "h": 20, "text": "领取奖励", "score": 0.91}}
POST /api/find_text_roi {"roi": [0.0, 0.6, 1.0, 1.0], "text_keyword": "夸克", "min_score": 0.3}
Similar to find_text but requires roi (array format [sx, sy, ex, ey], 0-1).
POST /api/find_template_roi {"path": "icons/app_qishui.png", "roi": [0.1, 0.2, 0.5, 0.6], "threshold": 0.75}
Similar to find_template but requires a roi region.
POST /api/change_focus {"value": 2} # Focus near +2
POST /api/change_focus {"value": -2} # Focus far -2
Incremental adjustment (range 0~500). Returns {"ok": true, "data": {"focus": 310.0}}.
POST /api/find_all_text {"min_score": 0.5}
Returns all recognized text on screen.
Performance Warning: find_all_text does a full-screen OCR scan using CPU inference. Time varies by text density:
Best Practice: Use find_text with a specific roi whenever possible — it's orders of magnitude faster.
POST /api/find_all_templates {"template_paths": ["icons/a.png", "icons/b.png"], "threshold": 0.75}
Returns true only if all templates are found.
POST /api/find_any_template {"template_paths": ["icons/a.png", "icons/b.png"], "threshold": 0.75}
Returns the first matched icon.
POST /api/go_home {"max_retries": 5}
max_retries: maximum retry attempts, default 5 (detect desktop → swipe up → re-detect).
POST /api/go_back
POST /api/go_forward
POST /api/reset
POST /api/swipe_up {"large": true} # Large swipe up
POST /api/swipe_down {"large": false} # Small swipe down
POST /api/click_icon
{"path": "icons/app_qishui.png", "threshold": 0.75, "roi": [0.1, 0.2, 0.5, 0.6], "mask": false, "reset": true}
POST /api/click_icons
{"paths": ["icons/phone/num1.png", "icons/phone/num3.png", "icons/phone/num2.png"], "interval": 1}
Clicks each icon sequentially. After each click the arm exits the frame, then resets after all clicks. Returns {"ok": true, "data": {"clicked": true, "success_count": N, "failed": []}}.
POST /api/click_icon_many_times
{"path": "icons/qishui/like.png", "count": 3, "interval": 0.5}
Searches for the icon once, then clicks the same position multiple times without moving or resetting. Resets only after all clicks. Returns {"ok": true, "data": {"clicked": true, "clicks": 3}}.
POST /api/click_text
{"text": "领取", "roi": [0.3, 0.5, 0.7, 0.8], "min_score": 0.5}
POST /api/click {"x": 0.5, "y": 0.96}
POST /api/click_roi {"roi": [0.3, 0.5, 0.7, 0.8]}
POST /api/swipe
{"sx": 0.5, "sy": 0.8, "ex": 0.5, "ey": 0.1, "steps": 5, "duration": 0.3}
sx/sy/ex/ey are phone percentage coordinates, steps is the number of steps, duration is total swipe time (seconds).
POST /api/close_all_apps {"max_swipes": 15}
GET /api/run_app?app_name=汽水音乐
# or
POST /api/run_app {"app_name": "汽水音乐"}
Looks up the app icon in app_desktop.yaml and clicks it.
POST /api/run_script {"path": "scripts/qishui_daily.yaml"}
Returns:
{"ok": true, "data": {"script": "D:\\...\\scripts/qishui_daily.yaml", "status": "started"}}
GET /api/script_status
Returns:
{"ok": true, "data": {"running": false, "current_script": null}}
GET /api/script_progress
Returns full execution log + stats:
{
"ok": true,
"data": {
"running": true,
"script": "scripts/xxx.yaml",
"current_step": {"step_index": 5, "action": "switch_page", "target": "拨号页", "status": "ok", "detail": "Matched branch: 拨号页", "timestamp": 1780329450.91},
"steps_log": [
{"step_index": 0, "action": "script_start", "target": "test_66", "status": "ok", "detail": "5 top-level steps", "timestamp": ...},
...
],
"stats": {
"total_steps": 24,
"completed_steps": 23,
"failed_steps": 0,
"elapsed": 61.6,
"status": "running"
}
}
}
steps_log: complete step history with timestampsstats: progress stats including elapsed time, completed/failed counts{"running": false, "script": null, "current_step": null, "steps_log": [], "stats": {}}POST /api/stop_script
POST /api/wait_for_template {"path": "icons/qishui/reward_popup.png", "timeout": 10, "interval": 0.5}
Polls until the template appears or timeout. Checks every interval seconds within timeout seconds.
POST /api/wait_for_page {"config_path": "scripts/configs/app_xxx.yaml", "target_name": "RewardsPage", "timeout": 15}
Polls until the specified page appears or timeout.
POST /api/clear_overlay
GET /api/config/daily
PUT /api/config/daily {"windows": [...]}
GET /api/config/app/qishui # Returns app_qishui.yaml content
PUT /api/config/app/qishui # Update config (body is YAML text)
GET /api/config/gesture
PUT /api/config/gesture {...}
POST /api/shutdown
Graceful shutdown: detect running script → safe terminate → arm reset → release resources → process exit.
Note: Do not kill the process directly — the COM port won't be released, and you'll need to reinstall the driver on next startup.
RobotArmServer.exe minimizes to the system tray on launch:
/api/shutdown (same graceful shutdown flow)RobotArmServer/
├── RobotArmServer.exe ← Main program
├── _internal/ ← Program libraries (do not touch)
├── scripts/ ← YAML scripts directory
│ ├── hello_flexarm.yaml ← Your first script
│ ├── daily_config.yaml ← Daily scheduled task config
│ ├── configs/
│ │ ├── app_desktop.yaml ← Phone desktop config (required)
│ │ ├── app_qishui.yaml ← Qishui Music page config
│ │ └── app_kuaishou.yaml ← Kuaishou page config
│ └── qishui/ ← Sub-scripts directory
│ ├── run_*.yaml
│ └── music_actions.yaml
├── icons/ ← Icon templates directory
│ ├── app_phone.png
│ ├── app_camera.png
│ ├── app_qishui.png
│ └── qishui/
│ ├── cross.png
│ └── ...
├── calibrations/ ← Calibration results (auto-generated)
├── screenshots/ ← Screenshot save directory
├── camera_config.json ← Camera focus config
├── device_config.json ← Device config
├── gesture_config.json ← Swipe gesture config
└── robot-arm-service/ ← Windows service driver
robot-arm-service\安装.bat has been run as AdministratorGET /api/arm_status returns "connected": truecalibrations/)icons/threshold (e.g., 0.65)mask: false (4-corner background sampling, more tolerant)roi to narrow the search areaPOST /api/change_focus {"value": 5})min_score to 0.5+ to reduce false matchesroi to narrow the search areafind_all_text to see what text the OCR actually recognizesGET /api/script_progress to see which step is stuckicons/scripts/configs/app_desktop.yaml (add the new icon to features)scripts/configs/app_xxx.yaml (define each page inside the app)scripts/run_xxx.yaml (define the workflow)POST /api/run_script {"path": "scripts/run_xxx.yaml"}All APIs return a uniform format: {"ok": true/false, "data": {...}, "error": "..."}
| Error | Cause | Resolution |
|---|---|---|
"error": "Script is running" | A script is executing in the background | Check script_status, wait for completion, or call stop_script |
"error": "RobotActions not initialized" | Arm not connected / service not started | Guide user to check robot-arm-service\安装.bat |
"error": "Missing parameter: path" | Incomplete request parameters | Check API call parameters |
"ok": false, "data": null (find_template) | Icon not found | Lower threshold or check icon file, do not retry indefinitely |
"ok": false, "data": null (find_text) | OCR text not found | Widen roi or lower min_score, try at most 2-3 times then report |
"error": "Unauthorized" | License check failed | Guide user to activate |
go_back or go_home → re-detect → still no match → report to userscript_status → if running, wait or report to userBelow is a complete example showing how an AI Agent combines APIs to "find and open the Qishui Music app on the desktop":
Step 1: Check service status
curl http://127.0.0.1:7826/api/health
# Returns: {"ok":true,"data":{"status":"running","arm_connected":true,...}}
Step 2: Detect current desktop
curl -X POST http://127.0.0.1:7826/api/detect_desktop -H "Content-Type: application/json" -d '{}'
# Returns: {"ok":true,"data":{"page_name":"desktop_page1","score":0.94,"matched":true}}
Step 3: Find the Qishui Music icon
curl -X POST http://127.0.0.1:7826/api/find_template -H "Content-Type: application/json" -d '{"path":"icons/app_qishui.png","threshold":0.75}'
# Returns: {"ok":true,"data":{"x":242,"y":516,"w":55,"h":55,"score":0.92}}
Step 4: Click the icon
curl -X POST http://127.0.0.1:7826/api/click_icon -H "Content-Type: application/json" -d '{"path":"icons/app_qishui.png"}'
# Returns: {"ok":true,"data":{"clicked":true,"score":0.92,...}}
Step 5: Wait for app launch, detect page
python -c "import requests,time; time.sleep(2)"
python -c "import requests; r=requests.post('http://127.0.0.1:7826/api/detect_page',json={'config_path':'scripts/configs/app_qishui.yaml'}); print(r.text)"
# Returns: {"ok":true,"data":{"page_name":"Music","score":0.85,"matched":true}}
Step 6: Confirm phone is in frame
curl http://127.0.0.1:7826/api/is_phone_present
# Returns: {"ok":true,"data":{"present":true}}
✅ Task complete: Qishui Music is open, currently on the Music page.
Or more directly, if app_desktop.yaml is properly configured and the Qishui Music icon exists, you can use the run_app API endpoint directly. It will intelligently auto-navigate, find the correct desktop page, and click the icon.