Install
openclaw skills install screen-visionmacOS screen OCR & click automation via Apple Vision + ScreenCaptureKit. Capture any window or screen region, extract text with coordinates, find text, and c...
openclaw skills install screen-visionmacOS screen OCR & click automation CLI powered by Apple Vision + ScreenCaptureKit.
Before running any screen-vision command, check if the binary exists. If not, run the setup script:
command -v screen-vision &>/dev/null || bash "${CLAUDE_SKILL_DIR}/setup.sh"
This installs screen-vision (via Homebrew or source build) and cliclick automatically.
| Command | Description | Output |
|---|---|---|
screen-vision ocr [--app NAME] | Full OCR | JSON array [{text, x, y, w, h, confidence}] |
screen-vision list [--app NAME] | OCR list | Human-readable text with coordinates |
screen-vision find "text" [--app NAME] | Find text | JSON {text, x, y, found} |
screen-vision has "text" [--app NAME] | Check text exists | Exit code 0 (found) / 1 (not found) |
screen-vision tap "text" [--app NAME] [--retry N] | Find + click | JSON {text, x, y, tapped} |
screen-vision wait "text" [--timeout SEC] | Poll until text appears | JSON {text, x, y, found} |
--region x,y,w,h > --app "AppName" > full screen (default)
screen-vision list --app "Safari"
screen-vision has "Submit" --app "MyApp" && echo "Found" || echo "Not found"
screen-vision tap "OK" --app "MyApp" --retry 3
screen-vision wait "Complete" --timeout 30
screen-vision ocr | jq '.[].text'
Parse the user's request to determine which command to run:
screen-vision listscreen-vision find "text"screen-vision tap "text"screen-vision has "text"screen-vision wait "text"