Screen Vision
Analysis
The skill's screen OCR and clicking features are disclosed, but it auto-runs an installer that downloads or builds unpinned remote executables and installs persistent system tools, so it should be reviewed before use.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Checks for instructions or behavior that redirect the agent, misuse tools, execute unexpected code, cascade across systems, exploit user trust, or continue outside the intended task.
Before running any screen-vision command, check if the binary exists. If not, run the setup script: command -v screen-vision &>/dev/null || bash "${CLAUDE_SKILL_DIR}/setup.sh"Normal skill use can trigger a local shell installer automatically when the binary is missing, rather than using a separate reviewed install specification.
brew install jackyun1024/tap/screen-vision ... curl -sL https://github.com/jackyun1024/mac-screen-vision/releases/download/v1.0.0/screen-vision-1.0.0-arm64-macos.tar.gz | tar xz -C /usr/local/bin/ ... git clone --depth 1 https://github.com/jackyun1024/mac-screen-vision.git "$TMPDIR/sv" && swift build -c release
The installer pulls executable code from external package, release, and source-build paths without checksum verification or pinned commit provenance, then installs it system-wide.
`screen-vision tap "text" [--app NAME] [--retry N]` | Find + click
The skill exposes a text-based click automation command through Bash; this is purpose-aligned but can cause unintended UI actions if the OCR target is ambiguous or the wrong app is in focus.
Checks whether tool use, credentials, dependencies, identity, account access, or inter-agent boundaries are broader than the stated purpose.
Capture any window or screen region, extract text with coordinates, find text, and click on it ... Screen Recording permission
The skill requires broad macOS screen-reading authority and can interact with visible UI elements, which is expected for its purpose but high-impact.
Checks for exposed credentials, poisoned memory or context, unclear communication boundaries, or sensitive data that could leave the user's control.
`screen-vision ocr [--app NAME]` | Full OCR | JSON array `[{text, x, y, w, h, confidence}]` ... `--region x,y,w,h > --app "AppName" > full screen (default)`Full-screen OCR can bring arbitrary visible text, including sensitive data or untrusted instructions from webpages/apps, into the agent's working context.
