{"skill":{"slug":"ui-element-ops","displayName":"UI Element Ops","summary":"Parse UI screenshots into structured element JSON (type, OCR text, bbox) and operate desktop UI from parsed elements. Use when a user asks to detect/locate U...","description":"---\nname: ui-element-ops\ndescription: Parse UI screenshots into structured element JSON (type, OCR text, bbox) and operate desktop UI from parsed elements. Use when a user asks to detect/locate UI elements, return coordinates, find elements by text/type, wait for element appearance or disappearance, click/type/press keys/hotkeys, take screenshots, or calibrate coordinates for multi-display/DPI/window offsets.\n---\n\n# UI Element Ops\n\nParse one or more screenshots into a machine-readable JSON schema with:\n\n- `type` (normalized UI element type)\n- `bbox_px` and `bbox_norm`\n- `text` (OCR/caption content when available)\n- `clickable` flag\n- optional overlay image with labeled boxes\n- desktop actions via `scripts/operate_ui.py` (click/type/key/hotkey/screenshot)\n- element query and orchestration via `scripts/operate_ui.py` (`find`, `wait`)\n- coordinate calibration profile for multi-display/DPI/window offset (`calibrate`)\n\n## Quick Start\n\n1. Prepare runtime once per machine:\n```bash\nskills/ui-element-ops/scripts/bootstrap_omniparser_env.sh \"$PWD\"\n```\n\n2. Parse one screenshot:\n```bash\nskills/ui-element-ops/scripts/run_parse_ui.sh /abs/path/to/1.jpeg\n```\n\n3. Read outputs:\n- `<image>.elements.json`\n- `<image>.overlay.png`\n\n4. One-step capture + parse with randomized names:\n```bash\nskills/ui-element-ops/scripts/capture_and_parse.sh\n```\n\n## Workflow\n\n1. Confirm screenshot path and desired output path.\n2. Run `scripts/bootstrap_omniparser_env.sh` when `.venv` or OmniParser weights are missing.\n3. Run `scripts/run_parse_ui.sh` for standard parsing.\n4. Report absolute output paths and summary counts: `total`, `clickable`, `by_type`.\n5. Call out obvious quality risks for tiny text or dense icon layouts.\n6. Execute desktop actions when requested:\n   - list elements: `python3 skills/ui-element-ops/scripts/operate_ui.py list --elements <json>`\n   - find elements: `python3 skills/ui-element-ops/scripts/operate_ui.py find --elements <json> --type button --text-contains login`\n   - wait for appear/disappear: `python3 skills/ui-element-ops/scripts/operate_ui.py wait --elements <json> --state appear --text-contains continue`\n   - click by id: `python3 skills/ui-element-ops/scripts/operate_ui.py click --elements <json> --id e_0001`\n   - screenshot: `python3 skills/ui-element-ops/scripts/operate_ui.py screenshot` (defaults to user tmp dir)\n   - calibrate coordinates: `python3 skills/ui-element-ops/scripts/operate_ui.py calibrate --parsed-size <w> <h> --actual-size <w> <h>`\n\n## Tunables\n\n- Edit type mapping keywords in `references/type_rules.example.json`.\n- Use advanced parser args via `scripts/parse_ui.py --help`.\n- Use `--use-paddleocr` only when `paddleocr`/`paddlepaddle` are installed.\n\n## Outputs\n\n- Main JSON output:\n  - `schema_version`, `pipeline`, `image`, `counts`, `elements`\n  - each element has `id`, `type`, `bbox_px`, `bbox_norm`, `text`, `clickable`\n- Overlay PNG output:\n  - same screenshot with labeled detection boxes\n\n## Failure Handling\n\n- Missing dependencies or weights: run bootstrap script again.\n- Permission/cache errors under `$HOME`: keep temporary caches under `/tmp` (handled by run script).\n- CPU-only machine: expect slower inference.\n- Performance note: parse/capture-and-parse commands are heavy; avoid very tight loops and reuse recent `elements.json` when possible.\n- Headless environment limitation:\n  - usable without GUI: parse/list/find/wait/calibrate on existing files.\n  - requires GUI session: click/click-xy/type/key/hotkey/screenshot/screen-info.\n","tags":{"latest":"1.0.2"},"stats":{"comments":0,"downloads":914,"installsAllTime":4,"installsCurrent":4,"stars":0,"versions":3},"createdAt":1772180553422,"updatedAt":1778993620134},"latestVersion":{"version":"1.0.2","createdAt":1772194272301,"changelog":"- Add performance note advising not to use parse/capture-and-parse commands in tight loops and to reuse recent elements.json outputs when possible.\n- No code changes; documentation update only.","license":null},"metadata":null,"owner":{"handle":"murongg","userId":"s171py5d828pf4dakhn5phqand885nwt","displayName":"MuRong","image":"https://avatars.githubusercontent.com/u/38655613?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1779963885431}}