Desktop Agent Ops

v1.0.3

Execute cross-platform desktop tasks through a packaged desktop automation skill that guides the main agent to observe the screen, focus apps and windows, ca...

0· 208·0 current·0 all-time
byTRIP@appergb
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name/description (cross‑platform desktop GUI automation) aligns with the requested binaries (python3, cliclick/xdotool) and with the included scripts (capture, OCR, window bounds, click/type helpers). The references and workflows focus on chat apps and other desktop targets, which justifies OCR and input tooling.
Instruction Scope
SKILL.md instructs the agent to run the bundled scripts to capture screenshots, run OCR, focus windows, and generate input events — all of which are necessary for the stated purpose. It mandates an initial auto-setup step that can install tooling, create a venv, open OS permission dialogs, and run smoke tests. Those actions are within scope but are invasive (screen recording, accessibility, automatic installs) and can control arbitrary desktop apps, including sending messages if the agent follows chat workflows. There is no guidance to contact external endpoints in the instructions.
!
Install Mechanism
The registry shows no formal installer spec, but SKILL.md and the included scripts perform a local auto-setup (first_run_setup.py) that installs system binaries (cliclick, tesseract via brew on macOS), Python dependencies (venv + pip/uv installs), and OCR language packs. Auto-install behavior is supported by the package itself (scripts are included and non-empty). This is coherent with the skill's goals but raises the usual risks of running an arbitrary bundled installer script without auditing it first (it will write to disk, may run shell commands, and open system settings).
Credentials
The skill does not request credentials, environment variables, or external API tokens. That matches expectations for a purely local desktop automation tool. No unrelated secrets or config paths are declared.
Persistence & Privilege
The skill is not marked always:true and does not request special platform-level config paths in the manifest. However the scripts are designed to create an external venv and temporary task directories and to prompt for OS permissions (Accessibility, Screen Recording, Automation). Those are necessary for GUI automation but are high-impact permissions and require explicit user grant at the OS level.
Assessment
This skill appears to do what it says (screen capture, OCR, focus apps, mouse/keyboard actions). Before installing or running it: - Audit the installer script (scripts/first_run_setup.py) and any bootstrap scripts for shell commands or network calls you don't expect. The package will run installs and create a venv on first run. - Expect and approve OS permission prompts (Accessibility, Screen Recording, Automation). Granting these allows the skill to observe the screen and drive input — treat that as granting powerful local access. - Because the skill can operate chat apps, do not run it on machines containing sensitive accounts or private conversations until you’ve tested in a safe sandbox account. - Run first_run_setup.py and smoke tests in a controlled environment (VM or throwaway account) first to confirm exactly what is installed and what the smoke test does. - Search the bundled scripts for any outbound network activity or telemetry (HTTP, sockets, uploads). If present, verify endpoints and purpose before proceeding. - If you are uncomfortable running an automatic installer, create and point the skill at an explicit, user-created Python virtualenv and install dependencies manually after review. If you want, I can: (a) list the top-level contents of specific scripts (first_run_setup.py, permission_bootstrap.py, desktop_ops.py) so you can see what commands they run, or (b) search the code for obvious network calls or subprocess.exec usage and summarize findings.

Like a lobster shell, security has layers — review code before you run it.

latestvk976cmxk7j3z5dmdp6kek2dgb583k9xs

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🖥️ Clawdis
OSmacOS · Windows · Linux
Binspython3
Any bincliclick, xdotool

Comments