Xdotool Control
v1.0.0Mouse and keyboard automation using xdotool. Use when clicking Chrome extension icons, typing into GUI apps, switching browser tabs, automating desktop UI, o...
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name/description (xdotool-based desktop automation) match the included scripts and SKILL.md. All required commands (xdotool, scrot, optional ImageMagick) are appropriate for the claimed tasks. There are no unrelated cloud credentials, external APIs, or binaries requested that would be inconsistent with a GUI automation skill.
Instruction Scope
Instructions stay within desktop automation: finding windows, focusing, moving mouse, typing, taking screenshots, and using template-matching. A few items warrant attention: (1) the SKILL.md explicitly suggests 'Read screenshots with Claude's Read tool' — using the model to read screenshots may upload sensitive GUI content to the external model service; (2) there's a small snippet to send a 'Yes' into a tmux session (automating acceptance for a 'claude-session'), which can be used to approve prompts or automated flows — this is powerful and could be abused if misused. Otherwise the runtime steps are explicit and limited to local interactions.
Install Mechanism
This is an instruction-only skill with bundled scripts — no install spec that fetches remote code. All code is provided in the bundle; runtime depends on common distro packages (xdotool, scrot, imagemagick). No downloads from arbitrary URLs or archive extraction were found.
Credentials
The skill requests no environment variables, credentials, or config paths. The scripts only read local window state and write screenshots to /tmp. No secret-typed environment variables are required or referenced.
Persistence & Privilege
always is false and the skill does not request elevated/system-wide persistence. It does include an optional tmux automation pattern that targets a session named 'claude-session', but this operates at the user level and does not alter other skills or global agent config.
Assessment
This skill appears to be what it claims — a local Linux desktop automation helper using xdotool — but it gives the agent the ability to move the mouse, send keystrokes, and take screenshots. Before installing or enabling it, consider: 1) Only install if you trust the skill owner and you need local GUI automation. 2) Review the included scripts (they are bundled and readable) and do not run them as root. 3) Be cautious about using the skill together with any model-image-reading tool: screenshots saved to /tmp may contain passwords, auth cookies, or other sensitive UI state and may be transmitted to the model service when you use the 'Read' tool. 4) Note the tmux approve snippet — it can programmatically send confirmations into sessions (e.g., 'Yes' to a claude-session); ensure that's acceptable in your environment. 5) Because SKILL.md references an absolute path (~/.openclaw/workspace/skills/xdotool-control/...), confirm where your platform will place scripts so the sample invocations work. If you want additional assurance, run the scripts in a sandboxed user account or VM first, and avoid enabling autonomous invocation if you don't want the agent to trigger GUI actions without explicit user requests.Like a lobster shell, security has layers — review code before you run it.
latest
xdotool-control
Automate mouse, keyboard, and window operations on the Linux desktop. Primary use: clicking Chrome extension icons, interacting with GUI apps when browser CDP isn't connected.
Quick Start
# Find a window
xdotool search --name "Google Chrome"
# Click at screen coordinates
xdotool mousemove 1800 56 click 1
# Type text into focused window
xdotool type "hello world"
# Screenshot current state
scrot /tmp/snap.png
Core Patterns
1. Find + Focus + Click
# Find Chrome window, focus it, click at position
WIN=$(xdotool search --name "Google Chrome" | head -1)
xdotool windowactivate --sync "$WIN"
sleep 0.3
xdotool mousemove X Y click 1
2. Screenshot → Verify → Click Loop
Use this when you need to click an element but don't know its exact position:
bash ~/.openclaw/workspace/skills/xdotool-control/scripts/snap_verify_click.sh \
"Google Chrome" \ # Window name pattern
"extension_icon" \ # What to look for (label for your snap files)
1830 56 # Coordinates to click
Or use the full loop script for unknown positions:
bash ~/.openclaw/workspace/skills/xdotool-control/scripts/find_and_click.sh \
"Google Chrome" \
/tmp/target_icon.png \ # Template image to match (ImageMagick compare)
10 # Max attempts
3. Click Chrome Extension Icon
bash ~/.openclaw/workspace/skills/xdotool-control/scripts/click_extension.sh "OpenClaw"
# or
bash ~/.openclaw/workspace/skills/xdotool-control/scripts/click_extension.sh "Dawn"
This focuses Chrome and clicks the extensions puzzle-piece area, then scans for the named extension.
4. Tab Switching
# Switch to next tab
WIN=$(xdotool search --name "Google Chrome" | head -1)
xdotool windowactivate --sync "$WIN"
xdotool key ctrl+Tab
# Switch to specific tab (1-indexed)
xdotool key ctrl+2 # Tab 2
xdotool key ctrl+3 # Tab 3
# Open new tab
xdotool key ctrl+t
# Type a URL into address bar
xdotool key ctrl+l
sleep 0.2
xdotool type "https://example.com"
xdotool key Return
5. Type Into Window
WIN=$(xdotool search --name "Terminal" | head -1)
xdotool windowactivate --sync "$WIN"
sleep 0.2
xdotool type --clearmodifiers "command to type here"
xdotool key Return
6. Approve tmux Prompt (for Clawdy daemon)
SESSION=$(tmux ls | grep claude-session | head -1 | cut -d: -f1)
tmux send-keys -t "$SESSION" "Yes" Enter
Window Management
# List all windows with names
xdotool search --name "" | while read wid; do
name=$(xdotool getwindowname "$wid" 2>/dev/null)
[ -n "$name" ] && echo "$wid $name"
done | head -20
# Get window geometry (position + size)
xdotool getwindowgeometry $WIN_ID
# Move window to front
xdotool windowraise $WIN_ID
# Resize window
xdotool windowsize $WIN_ID 1280 800
# Move window
xdotool windowmove $WIN_ID 0 0
Screenshot Utilities
# Full desktop screenshot
scrot /tmp/desktop.png
# Specific window
scrot -u /tmp/active_window.png # Currently active window
# Crop a region (x,y,width,height)
scrot -a 1400,0,480,60 /tmp/toolbar.png
# With delay
scrot -d 2 /tmp/delayed.png
Read screenshots with Claude's Read tool — it renders images inline.
Chrome-Specific Patterns
# Chrome toolbar extension icons are typically at:
# y ≈ 56 (vertical center of toolbar)
# x varies by number of pinned extensions, roughly:
# Last icon: screen_width - 30
# Second-to-last: screen_width - 60
# Puzzle piece: screen_width - 90 (unpinned extensions menu)
Clicking a Pinned Extension Icon
# Always auto-detect screen width — never hardcode
read SCREEN_W SCREEN_H <<< $(xdotool getdisplaygeometry)
TOOLBAR_Y=56
# Take a toolbar snapshot first to verify positions
scrot -a "$((SCREEN_W-300)),0,300,70" /tmp/toolbar_snap.png
# Read /tmp/toolbar_snap.png to see icon positions visually
# Then click
xdotool mousemove $((SCREEN_W - 60)) $TOOLBAR_Y click 1
Dependencies
# Required
sudo apt-get install xdotool scrot
# Optional — enables template matching in find_and_click.sh
sudo apt-get install imagemagick
# Check all deps at once:
for dep in xdotool scrot convert; do
command -v "$dep" &>/dev/null && echo "✓ $dep" || echo "✗ $dep (missing)"
done
Tips & Gotchas
- Always
windowactivate --syncbefore clicking — without--sync, the click may fire before focus lands - Add
sleep 0.3after focus change before interacting with Chrome - Coordinates are screen-absolute, not window-relative — factor in window position from
getwindowgeometry xdotool typevsxdotool key: usetypefor text strings,keyfor special keys (ctrl+t, Return, Escape)--clearmodifiersontypeprevents Shift/Ctrl state from leaking into typed text- scrot -u captures only the currently active window — make sure to activate the right window first
- ImageMagick compare can do pixel-level template matching for verify loops (see
find_and_click.sh)
Comments
Loading comments...
