Install
openclaw skills install @vibes-me/gui-controlControl the GUI desktop on this machine using xdotool, scrot, and Firefox. Use when the user asks to open a browser, visit a website, take a screenshot, click/type on screen, or interact with any GUI application. This machine has a display — it is NOT headless.
openclaw skills install @vibes-me/gui-controlControl the Linux desktop with a GUI display using shell tools.
DISPLAY=:1 — ALWAYS prefix all GUI commands with thisxdotool (keyboard/mouse), scrot (screenshots), firefoxDISPLAY=:1 nohup firefox https://example.com > /dev/null 2>&1 &
Wait for page load before interacting:
sleep 5
DISPLAY=:1 scrot /tmp/screenshot.png
DISPLAY=:1 xdotool type --delay 50 "Hello world"
DISPLAY=:1 xdotool key Return
DISPLAY=:1 xdotool getactivewindow getwindowname
DISPLAY=:1 pkill firefox
DISPLAY=:1 nohup firefox <url> > /dev/null 2>&1 &sleep 5DISPLAY=:1 scrot /tmp/step.pngxdotool key Tab — move focusxdotool key Return — submit/confirmxdotool type --delay 50 "text" — type into focused fieldmessage tool and media parameterxdotool mousemove + click/ to focus the search barnohup ... & for launching Firefox so it doesn't block the shellmessage(content="...", media=["/tmp/screenshot.png"])xdotool + keyboard shortcuts work great. Don't jump to Selenium/Marionette unless absolutely needed.ps aux | grep firefox/ focuses search bar, Tab navigates between elements, Return selectsCtrl+F opens find bar, Ctrl+L focuses address bar, Tab cycles focusxdotool mousemove with hardcoded coordinates — they break on different resolutions and you might click the wrong element (e.g., address bar instead of YouTube search)xdotool mousemove 640 120 will click different things on different screensDISPLAY=:1)DISPLAY=:0 — the correct display is :1DISPLAY=:1 scrot /tmp/screen.pngread_file("/tmp/screen.png") — this lets YOU see the screenmessage(content="...", media=["/tmp/screen.png"])nanobot gateway, always start with DISPLAY=:1 so Telegram/Discord agents can use GUI