UI Element Ops
ReviewAudited by ClawScan on May 10, 2026.
Overview
The skill is purpose-aligned, but review is recommended because it can automate your desktop and its setup pulls unpinned external parser code and model files.
Install only if you intend to allow this skill to parse screenshots and automate your desktop. Review and preferably pin the external OmniParser/PyPI/Hugging Face dependencies before bootstrapping, confirm UI actions before running clicks or hotkeys, and clean up generated screenshot/OCR files that may contain private information.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
The behavior of the parser could change if upstream code or model files change, and a compromised or tampered dependency could affect the local machine.
The setup fetches external code and model artifacts into a default /tmp location without pinning a commit/revision or verifying checksums, which creates a provenance and tamper-resistance gap for code the parser later relies on.
OMNIPARSER_DIR="${3:-/tmp/OmniParser}" ... git clone --depth 1 https://github.com/microsoft/OmniParser "$OMNIPARSER_DIR" ... "$VENV_PATH/bin/hf" download microsoft/OmniParser-v2.0 ... --local-dir "$OMNIPARSER_DIR/weights"Pin package versions and the OmniParser commit/model revision, verify checksums where possible, and prefer a user-owned non-/tmp install directory.
A mistaken element match or coordinate calibration issue could click, type, or send hotkeys in the wrong application.
The skill intentionally exposes desktop automation actions that can interact with any currently visible GUI. This is central to the skill purpose and disclosed, but it is still high-impact.
desktop actions via `scripts/operate_ui.py` (click/type/key/hotkey/screenshot) ... Execute desktop actions when requested
Use dry runs or list/find first, visually confirm targets before clicks/hotkeys, and keep PyAutoGUI failsafe enabled.
Private information visible on screen may remain in `.elements.json`, overlay images, or screenshots after the task is done.
The skill persists OCR text and annotated screenshots as local artifacts. This is expected for UI parsing, but those files can contain sensitive screen contents.
Main JSON output ... each element has `id`, `type`, `bbox_px`, `bbox_norm`, `text`, `clickable` ... Overlay PNG output
Avoid capturing sensitive screens, choose a protected output directory, and delete generated screenshots/JSON/overlays when no longer needed.
