Back to skill
v1.2.0

Screen Vision

ReviewClawScan verdict for this skill. Analyzed May 1, 2026, 7:48 AM.

Analysis

The skill's screen OCR and clicking features are disclosed, but it auto-runs an installer that downloads or builds unpinned remote executables and installs persistent system tools, so it should be reviewed before use.

GuidanceBefore installing, review and run setup.sh manually rather than allowing automatic setup, and prefer verified or pinned installation sources. Only grant Screen Recording permission if you are comfortable with the agent reading visible screen text. Use --app or --region whenever possible, keep secrets off-screen, and supervise tap/click commands.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Abnormal behavior control

Checks for instructions or behavior that redirect the agent, misuse tools, execute unexpected code, cascade across systems, exploit user trust, or continue outside the intended task.

Unexpected Code Execution
SeverityMediumConfidenceHighStatusConcern
SKILL.md
Before running any screen-vision command, check if the binary exists. If not, run the setup script: command -v screen-vision &>/dev/null || bash "${CLAUDE_SKILL_DIR}/setup.sh"

Normal skill use can trigger a local shell installer automatically when the binary is missing, rather than using a separate reviewed install specification.

User impactUsing the skill may execute installation commands on the user's Mac before the user has separately reviewed the installer.
RecommendationRun setup manually only after reviewing it, or require an explicit user approval step before executing setup.sh.
Agentic Supply Chain Vulnerabilities
SeverityHighConfidenceHighStatusConcern
setup.sh
brew install jackyun1024/tap/screen-vision ... curl -sL https://github.com/jackyun1024/mac-screen-vision/releases/download/v1.0.0/screen-vision-1.0.0-arm64-macos.tar.gz | tar xz -C /usr/local/bin/ ... git clone --depth 1 https://github.com/jackyun1024/mac-screen-vision.git "$TMPDIR/sv" && swift build -c release

The installer pulls executable code from external package, release, and source-build paths without checksum verification or pinned commit provenance, then installs it system-wide.

User impactIf the remote tap, release artifact, or repository is compromised or changes unexpectedly, the user could install and run untrusted code.
RecommendationUse a declared install spec with pinned versions and checksums, avoid mutable source builds for automatic setup, and let users verify the downloaded artifact before installation.
Tool Misuse and Exploitation
SeverityMediumConfidenceHighStatusNote
SKILL.md
`screen-vision tap "text" [--app NAME] [--retry N]` | Find + click

The skill exposes a text-based click automation command through Bash; this is purpose-aligned but can cause unintended UI actions if the OCR target is ambiguous or the wrong app is in focus.

User impactA mistaken or ambiguous text match could click the wrong button or change data in an open application.
RecommendationUse app or region limits for tap commands and confirm high-impact clicks manually.
Permission boundary

Checks whether tool use, credentials, dependencies, identity, account access, or inter-agent boundaries are broader than the stated purpose.

Identity and Privilege Abuse
SeverityMediumConfidenceHighStatusNote
SKILL.md
Capture any window or screen region, extract text with coordinates, find text, and click on it ... Screen Recording permission

The skill requires broad macOS screen-reading authority and can interact with visible UI elements, which is expected for its purpose but high-impact.

User impactThe agent may be able to read sensitive text visible on screen and click controls in other applications.
RecommendationGrant Screen Recording permission only if comfortable with this access, prefer --app or --region scoping, and supervise click automation.
Sensitive data protection

Checks for exposed credentials, poisoned memory or context, unclear communication boundaries, or sensitive data that could leave the user's control.

Memory and Context Poisoning
SeverityMediumConfidenceHighStatusNote
SKILL.md
`screen-vision ocr [--app NAME]` | Full OCR | JSON array `[{text, x, y, w, h, confidence}]` ... `--region x,y,w,h  >  --app "AppName"  >  full screen (default)`

Full-screen OCR can bring arbitrary visible text, including sensitive data or untrusted instructions from webpages/apps, into the agent's working context.

User impactPrivate information visible on screen could be read by the agent, and hostile on-screen text could influence the agent if treated as trustworthy.
RecommendationAvoid displaying secrets while using the skill, scope capture to a specific app or region, and treat OCR output as untrusted context.