Install
openclaw skills install desktop-automation-ultraAutomate comprehensive desktop tasks on Windows/macOS/Linux with safe, logged mouse, keyboard, OCR, image recognition, macro recording, and replay features.
openclaw skills install desktop-automation-ultraComplete desktop automation for Windows/macOS/Linux. Zero-error edition.
CRITICAL: This skill captures ALL keyboard and mouse events.
recorded_macro/ directoryAutomate desktop interactions without APIs:
Blocks dangerous actions when enabled:
type, press_key, click, drag are monitoredrm , del , C:\Windows\, /etc/, sudo, etc.All actions support dry_run=true:
Every action logged to ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log
All modules use locks to prevent race conditions.
Place desktop-automation-ultra-local/ in:
C:\Users\<User>\.openclaw\workspace\skills\~/.openclaw/workspace/skills/pip install -r requirements.txt
For find_text_on_screen functionality:
sudo apt install tesseract-ocrbrew install tesseractopenclaw gateway restart
action: click
params:
x: 100
y: 100
dry_run: true # Test first!
action: type
params:
text: "Hello World"
interval: 0.05 # Delay between keys
dry_run: false
action: find_image
params:
template_path: "templates/button.png"
confidence: 0.95
action: read_text_ocr
params:
lang: "fra" # French
| Action | Parameters | Returns |
|---|---|---|
click | x, y, button="left", dry_run | {status, x, y} |
type | text, interval=0.05, dry_run | {status, text} |
press_key | key, dry_run | {status, key} |
move_mouse | x, y, duration=0.5, dry_run | {status, x, y} |
scroll | amount=5, dry_run | {status, amount} |
drag | start_x, start_y, end_x, end_y, duration=0.5, dry_run | {status} |
copy_to_clipboard | text, dry_run | {status} |
paste_from_clipboard | dry_run | {status, length} |
| Action | Parameters | Returns |
|---|---|---|
screenshot | path="~/Desktop/screenshot.png", dry_run | {status, path} |
get_active_window | dry_run | {status, title, x, y, width, height} |
list_windows | dry_run | {status, windows[], count} |
activate_window | title_substring, dry_run | {status, title} |
| Action | Parameters | Returns |
|---|---|---|
find_image | template_path, confidence=0.9, dry_run | {status, x, y, confidence} |
find_image_multiscale | template_path, confidence, scale_factors, dry_run | {status, x, y, confidence, scale} |
wait_for_image | template_path, timeout=30.0, interval=0.5, confidence=0.9, dry_run | {status, x, y, confidence} |
| Action | Parameters | Returns |
|---|---|---|
find_text_on_screen | text, lang="fra", dry_run | {status, locations[], count} |
find_all_text_on_screen | text, lang="fra", dry_run | {status, data[], count} |
read_text_ocr | lang="fra", dry_run | {status, text, length} |
read_text_region | x, y, width, height, lang="fra", dry_run | {status, text, length} |
extract_screen_data | region={}, output_format="json", lang="fra", dry_run | {status, data[], count} |
| Action | Parameters | Returns |
|---|---|---|
play_macro | macro_path, speed=1.0, dry_run | {status, executed, total, errors[]} |
stop_macro | — | {status} |
play_macro_with_subroutines | macro_path, speed=1.0, sub_macros_dir, dry_run | {status, executed, total, errors[]} |
| Action | Parameters | Returns |
|---|---|---|
set_safe_mode | enabled=true | {status, safe_mode} |
get_safety_status | — | {status, safe_mode_enabled, dangerous_patterns, dangerous_actions[]} |
Recorded macros are JSON with this structure:
{
"events": [
{
"action": "click",
"params": {"x": 100, "y": 50},
"wait": 500
},
{
"action": "type",
"params": {"text": "Hello"},
"wait": 200
},
{
"action": "press_key",
"params": {"key": "return"},
"wait": 100
}
]
}
action — action nameparams — action parameterswait — milliseconds to wait before next actionTo avoid recording hundreds of move_mouse events during a smooth drag, the recorder uses debouncing:
N seconds (default: 1 sec), the final position is recordedExample:
move_mouse event (end coordinates)move_mouse events (one per "stop")Run the unit test suite:
python scripts/test_automation.py
Output:
test_dry_run_click ... ok
test_get_active_window ... ok
test_safe_mode_blocks_dangerous ... ok
...
Ran 13 tests
OK
All actions logged to: ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log
Example:
[2026-03-15 10:23:45] [INFO] ActionManager: ActionManager initialized with safe_mode=True
[2026-03-15 10:23:46] [INFO] ActionManager: Clicked at (100, 50) with left button
[2026-03-15 10:23:47] [INFO] ActionManager: Typed: Hello World
# Override log directory
export AUTOMATION_LOG_DIR=~/my_logs
# Disable safe mode globally (NOT recommended)
export AUTOMATION_SAFE_MODE=false
Move mouse to corner of screen to stop.
read_text_ocr instead of find_text_on_screenfind_image_multiscale to detect at different scalesThis is intentional. To run dangerous actions:
action: set_safe_mode
params:
enabled: false
Then execute your action. Re-enable safe mode immediately after:
action: set_safe_mode
params:
enabled: true
MIT License. See LICENSE file.
desktop-automation-ultra-local/
├── SKILL.md (This file)
├── requirements.txt (Python dependencies)
├── lib/
│ ├── actions.py (Core click/type/drag actions)
│ ├── image_recognition.py (OpenCV template matching)
│ ├── ocr_engine.py (Tesseract OCR)
│ ├── macro_player.py (Record/playback macros)
│ ├── safety_manager.py (Safe mode, blocking)
│ └── utils.py (Logging, helpers)
├── scripts/
│ └── test_automation.py (Unit tests)
└── recorded_macro/ (Output: saved macros)
Status: PRODUCTION READY ✅
Last updated: 2026-03-15 Version: 2.0.0