Install
openclaw skills install visual-rpa-skillClawHub Security found sensitive or high-impact capabilities. Review the scan results before using.
Visual RPA desktop automation skill. Use when user asks to operate desktop apps, click icons, open applications, type text in input fields, click buttons, scroll pages, send messages via WeChat or other apps. Uses screen capture and Qwen vision model for pure visual positioning without DOM or accessibility APIs.
openclaw skills install visual-rpa-skillAuto-execute all steps without waiting for user confirmation between steps.
Desktop automation via screen capture + Qwen vision model (Qwen-VL). No DOM or accessibility API needed.
Use exec tool to run commands. Script path: $env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py
Requires DASHSCOPE_API_KEY environment variable to be set.
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "click to open WeChat"
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "open WeChat, open File Transfer chat, type hello in input box, click send"
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "click Chrome browser" "type baidu.com in address bar and press enter" "type weather in search box" "click search button"
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --no-verify --task "click to open Calculator"
| Parameter | Description |
|---|---|
--mode task | Batch task mode (required) |
--mode interactive | Interactive mode (default) |
--task "step1" "step2" | Task instructions, supports multiple |
--no-verify | Skip post-action verification |
--model MODEL | Vision model name (default: qwen-vl-max-latest) |
--api-key KEY | API Key (defaults to DASHSCOPE_API_KEY env var) |
| Action | Example instructions |
|---|---|
| Click | "click start menu", "click Chrome icon" |
| Double click | "double click Recycle Bin on desktop" |
| Right click | "right click on desktop blank area" |
| Type text | "type weather in search box", "type hello in input box" |
| Hotkey | "press Ctrl+C" |
| Scroll | "scroll down the page" |
| Wait | "wait for page to load" |
[OK] Step 0: click to open WeChat
click @ (375,1591)
[OK] Step 1: click File Transfer Assistant in WeChat
click @ (154,97)
[FAIL] Step 2: type hello in input box
type @ (300,1364)
2/3 succeeded
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "open WeChat, open File Transfer Assistant chat, type hello in input box, click send"
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "click Chrome browser" "type https://www.baidu.com in address bar and press enter"
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "right click on desktop blank area" "click New Folder"
./rpa_logs/ directory for debugging