Install
openclaw skills install @igallta/mac-gui-controlControl this macOS desktop via screenshots, AppleScript, mouse/keyboard automation, OCR/template matching, and verify loops.
openclaw skills install @igallta/mac-gui-controlUse this skill to operate this Mac's GUI deliberately: perceive the screen, choose the least fragile control channel, act once, verify, and repeat. Prefer app-native APIs, shell commands, or browser automation when they solve the task cleanly; use GUI control for real desktop surfaces, native dialogs, canvas apps, Electron apps, and visual workflows.
This skill is tailored for the current host:
python3, osascript, screencapture, Pillow, psutil, numpy, cliclick, PyAutoGUI, PyObjC Vision/Quartz/AppKit.cv2) and ImageMagick (magick). locate-template falls back to a Pillow/numpy matcher when OpenCV is unavailable.System Events deep window inspection and all pointer/keyboard injection require Accessibility permission for the runtime process.Use logical coordinates for actions. The bundled helper captures logical screenshots by default so screenshot coordinates map to mouse coordinates.
Tier 0 works immediately:
python3 {baseDir}/scripts/mac_gui.py env
python3 {baseDir}/scripts/mac_gui.py capture --output /tmp/mac_gui.png
python3 {baseDir}/scripts/mac_gui.py front-app
python3 {baseDir}/scripts/mac_gui.py activate --app "Google Chrome"
python3 {baseDir}/scripts/mac_gui.py key --combo command+l
python3 {baseDir}/scripts/mac_gui.py paste --text "hello"
Tier 1 adds reliable mouse control:
brew install cliclick
This host already has cliclick installed at /usr/local/bin/cliclick.
Then use:
python3 {baseDir}/scripts/mac_gui.py mouse click --x 500 --y 300
python3 {baseDir}/scripts/mac_gui.py mouse drag --x 400 --y 400 --to-x 900 --to-y 650
Tier 2 adds computer-vision helpers:
python3 -m pip install --user --break-system-packages pyautogui pyobjc-framework-Vision pyobjc-framework-Quartz pyobjc-framework-Cocoa numpy
This host already has PyAutoGUI, PyObjC Vision/Quartz/AppKit, Pillow, psutil, and numpy installed for the default python3.
OpenCV enables faster template matching when a compatible wheel exists. On this macOS 11 + Python 3.14 setup, opencv-python falls back to a long source build, so do not install it casually during interactive work. The helper's locate-template command uses OpenCV when present and otherwise falls back to a Pillow/numpy matcher.
Tier 3 is optional and heavier: install an MCP Accessibility/UI-tree server such as mcp-server-macos-use only when visual workflows are not enough and the task needs native accessibility-tree state.
If screenshots are blank, grant Screen Recording to the host app or terminal process.
If clicks, keyboard shortcuts, or System Events window inspection fail with not allowed assistive access / -1719, grant Accessibility to the host app or terminal process, then restart that process.
Use these shortcuts:
open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Automation"
Choose the highest reliable layer available:
{found,x,y,confidence,reason}.tab, shift+tab, return, escape) when mouse injection is ignored, especially OAuth/login/protected dialogs.Always follow this loop for GUI tasks:
For externally visible actions, such as sending messages, submitting forms, payments, deletions, public posts, or account changes, stop before the final submit/send/delete unless the user explicitly authorized that exact action.
Use the bundled helper from the skill directory.
Environment and coordinates:
python3 {baseDir}/scripts/mac_gui.py env
python3 {baseDir}/scripts/mac_gui.py capture --output /tmp/mac_gui.png
python3 {baseDir}/scripts/mac_gui.py capture --raw --output /tmp/mac_gui_raw.png
python3 {baseDir}/scripts/mac_gui.py front-app
python3 {baseDir}/scripts/mac_gui.py bounds --app "Google Chrome"
App and keyboard control:
python3 {baseDir}/scripts/mac_gui.py activate --app "Google Chrome"
python3 {baseDir}/scripts/mac_gui.py key --combo command+l
python3 {baseDir}/scripts/mac_gui.py key --combo return
python3 {baseDir}/scripts/mac_gui.py paste --text "text to paste"
printf 'multi\nline' | python3 {baseDir}/scripts/mac_gui.py paste --stdin
Mouse control after cliclick or pyautogui is available:
python3 {baseDir}/scripts/mac_gui.py mouse position
python3 {baseDir}/scripts/mac_gui.py mouse move --x 500 --y 300
python3 {baseDir}/scripts/mac_gui.py mouse click --x 500 --y 300
python3 {baseDir}/scripts/mac_gui.py mouse double --x 500 --y 300
python3 {baseDir}/scripts/mac_gui.py mouse right --x 500 --y 300
python3 {baseDir}/scripts/mac_gui.py mouse drag --x 500 --y 300 --to-x 900 --to-y 500
Template matching:
python3 {baseDir}/scripts/mac_gui.py locate-template --image /tmp/mac_gui.png --template /tmp/button.png --threshold 0.86
If OpenCV is absent, the command uses the Pillow/numpy fallback. Use a smaller crop and a larger --step for speed, for example --step 4, when searching a large screenshot.
The helper prints JSON for machine-readable chaining.
When using model vision on a screenshot or crop, require compact JSON before clicking:
{"found": true, "x": 742, "y": 681, "confidence": "high", "reason": "lower-right Queue button"}
Only click when found=true and confidence is high enough for the risk of the action. If confidence is medium, move/crop/verify first. If confidence is low or false, do not click.
Prefer cropping to a small relevant region before asking for a coordinate. Smaller screenshots reduce ambiguity and are faster to inspect.
Use clipboard paste for normal text, especially Chinese, Japanese, mixed-language prompts, filenames, and long prompts. Per-key simulated typing is slower and can break under IME input methods.
Default paste flow:
mac_gui.py paste.Do not type passwords, tokens, or secrets through GUI automation unless the user explicitly asks and the target context is trusted.
Default coordinate convention:
capture writes logical screenshots sized to the desktop coordinate system, normally 2560x1440 on this iMac.screencapture output is 5120x2880 on this 5K Retina display; use --raw only when preserving raw pixels matters.If an action fails:
bounds and front-app to confirm target state.System Events or mouse control reports Accessibility errors.Read references/source-notes.md when revising this skill or deciding whether to add heavier dependencies. It summarizes which ClawHub skills informed this design and what was intentionally rejected.