Desktop Gui
WarnAudited by ClawScan on May 10, 2026.
Overview
This is a coherent desktop automation skill, but it can upload full-screen screenshots to a remote AI service and execute that service’s click instructions on your real desktop without clear per-action approval.
Install only if you are comfortable with powerful desktop-wide automation. Use it first in a VM or test account, keep failsafe and pauses enabled, avoid sensitive screens, verify the model endpoint, require confirmation before actions, and do not send screenshots or API tokens over untrusted HTTP connections.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A mistaken or manipulated model response could click, type, submit forms, close windows, or change real account/business data in whatever application is visible.
The skill instructs the agent to execute model-produced GUI actions with xdotool on the live desktop.
截图 (scrot) → Qwen3.5-27b 分析 → 返回坐标/操作 → xdotool 执行 ... if data['action'] == 'click': ... subprocess.run(['xdotool', 'click', '1'])
Use only in a VM or isolated desktop, restrict automation to a chosen window/app, and require explicit user confirmation before clicks, typing, submissions, or window-closing actions.
Anything visible on the desktop, including private messages, documents, secrets, or account pages, could be sent to that model service.
The visual mode captures a full screenshot, base64-encodes it, and sends it to a model endpoint over HTTP with no stated data minimization, retention, or trust boundary.
subprocess.run(['scrot', '/tmp/screen.png']) ... requests.post("http://10.6.207.56:8000/v1/chat/completions", ... "image_url": {"url": f"data:image/png;base64,{img_b64}"})Do not use this mode on sensitive screens unless you fully trust the endpoint; prefer HTTPS, crop screenshots to the minimum region, and add explicit user approval before each upload.
A model API token may be mishandled or exposed on the network, and users may not realize a credential is needed for the recommended mode.
The example uses a bearer credential for the model endpoint, and it is shown being sent to an HTTP URL while the registry declares no required credential or environment variable.
requests.post("http://10.6.207.56:8000/v1/chat/completions", headers={"Authorization": "Bearer VLLM_API_KEY"}, ...)Declare the required credential clearly, store it in a scoped environment variable or secret manager, use HTTPS or a trusted local endpoint, and avoid sending bearer tokens over plain HTTP.
Installing these dependencies changes the local system and trusts upstream package sources.
The setup relies on live package repositories and privileged system package installation; this is purpose-aligned for GUI automation but still affects the local environment.
pip3 install pyautogui pygetwindow pymsgbox opencv-python-headless pillow pytesseract ... sudo apt-get install -y xdotool scrot tesseract-ocr ...
Install from trusted repositories, consider pinning Python package versions, and review the packages before using the skill on an important workstation.
