UI Element Ops

ReviewAudited by ClawScan on May 10, 2026.

Overview

The skill is purpose-aligned, but review is recommended because it can automate your desktop and its setup pulls unpinned external parser code and model files.

Install only if you intend to allow this skill to parse screenshots and automate your desktop. Review and preferably pin the external OmniParser/PyPI/Hugging Face dependencies before bootstrapping, confirm UI actions before running clicks or hotkeys, and clean up generated screenshot/OCR files that may contain private information.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

The behavior of the parser could change if upstream code or model files change, and a compromised or tampered dependency could affect the local machine.

Why it was flagged

The setup fetches external code and model artifacts into a default /tmp location without pinning a commit/revision or verifying checksums, which creates a provenance and tamper-resistance gap for code the parser later relies on.

Skill content
OMNIPARSER_DIR="${3:-/tmp/OmniParser}" ... git clone --depth 1 https://github.com/microsoft/OmniParser "$OMNIPARSER_DIR" ... "$VENV_PATH/bin/hf" download microsoft/OmniParser-v2.0 ... --local-dir "$OMNIPARSER_DIR/weights"
Recommendation

Pin package versions and the OmniParser commit/model revision, verify checksums where possible, and prefer a user-owned non-/tmp install directory.

What this means

A mistaken element match or coordinate calibration issue could click, type, or send hotkeys in the wrong application.

Why it was flagged

The skill intentionally exposes desktop automation actions that can interact with any currently visible GUI. This is central to the skill purpose and disclosed, but it is still high-impact.

Skill content
desktop actions via `scripts/operate_ui.py` (click/type/key/hotkey/screenshot) ... Execute desktop actions when requested
Recommendation

Use dry runs or list/find first, visually confirm targets before clicks/hotkeys, and keep PyAutoGUI failsafe enabled.

What this means

Private information visible on screen may remain in `.elements.json`, overlay images, or screenshots after the task is done.

Why it was flagged

The skill persists OCR text and annotated screenshots as local artifacts. This is expected for UI parsing, but those files can contain sensitive screen contents.

Skill content
Main JSON output ... each element has `id`, `type`, `bbox_px`, `bbox_norm`, `text`, `clickable` ... Overlay PNG output
Recommendation

Avoid capturing sensitive screens, choose a protected output directory, and delete generated screenshots/JSON/overlays when no longer needed.