UI Element Ops

ReviewAudited by ClawScan on May 10, 2026.

Overview

The skill is purpose-aligned, but review is recommended because it can automate your desktop and its setup pulls unpinned external parser code and model files.

Install only if you intend to allow this skill to parse screenshots and automate your desktop. Review and preferably pin the external OmniParser/PyPI/Hugging Face dependencies before bootstrapping, confirm UI actions before running clicks or hotkeys, and clean up generated screenshot/OCR files that may contain private information.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ConcernHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

The behavior of the parser could change if upstream code or model files change, and a compromised or tampered dependency could affect the local machine.

Why it was flagged

The setup fetches external code and model artifacts into a default /tmp location without pinning a commit/revision or verifying checksums, which creates a provenance and tamper-resistance gap for code the parser later relies on.

Skill content

OMNIPARSER_DIR="${3:-/tmp/OmniParser}" ... git clone --depth 1 https://github.com/microsoft/OmniParser "$OMNIPARSER_DIR" ... "$VENV_PATH/bin/hf" download microsoft/OmniParser-v2.0 ... --local-dir "$OMNIPARSER_DIR/weights"

Recommendation

Pin package versions and the OmniParser commit/model revision, verify checksums where possible, and prefer a user-owned non-/tmp install directory.

NoteHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

A mistaken element match or coordinate calibration issue could click, type, or send hotkeys in the wrong application.

Why it was flagged

The skill intentionally exposes desktop automation actions that can interact with any currently visible GUI. This is central to the skill purpose and disclosed, but it is still high-impact.

Skill content

desktop actions via `scripts/operate_ui.py` (click/type/key/hotkey/screenshot) ... Execute desktop actions when requested

Recommendation

Use dry runs or list/find first, visually confirm targets before clicks/hotkeys, and keep PyAutoGUI failsafe enabled.

NoteHigh Confidence

ASI06: Memory and Context Poisoning

What this means

Private information visible on screen may remain in `.elements.json`, overlay images, or screenshots after the task is done.

Why it was flagged

The skill persists OCR text and annotated screenshots as local artifacts. This is expected for UI parsing, but those files can contain sensitive screen contents.

Skill content

Main JSON output ... each element has `id`, `type`, `bbox_px`, `bbox_norm`, `text`, `clickable` ... Overlay PNG output

Recommendation

Avoid capturing sensitive screens, choose a protected output directory, and delete generated screenshots/JSON/overlays when no longer needed.