Install
openclaw skills install gui-automationControl the desktop via CUA computer server API running on port 8000
openclaw skills install gui-automationThis skill allows OpenClaw to control the desktop using the CUA computer server API.
This skill requires installing and running a third-party server (cua-computer-sdk) that has full control over your desktop.
Before using this skill:
Run the server only when needed, in a terminal you can monitor:
# Install the Computer SDK (official CUA package)
pip install cua-computer-sdk
# Verify package (optional but recommended)
pip show cua-computer-sdk # Check publisher and version
# Run temporarily (Ctrl+C to stop)
cua-server start --port 8000 --bind 127.0.0.1
# In another terminal, verify it's running locally only
curl http://localhost:8000/status
netstat -an | grep 8000 # Should show 127.0.0.1:8000
This is the safest approach - the server only runs when you explicitly start it and stops when you close the terminal.
For transparency, you can review and run from source:
# Clone and review the code first
git clone https://github.com/trycua/cua-computer-server
cd cua-computer-server
# Review the code before running
ls -la
cat requirements.txt # Check dependencies
# Install and run
pip install -r requirements.txt
python -m cua_server --port 8000 --bind 127.0.0.1
Option 1: Manual Start (Recommended)
# Start in foreground - you can see what it's doing
cua-server start --port 8000
# Stop with Ctrl+C when done
Option 2: Background Process (Temporary)
# Run in background for current session only
cua-server start --port 8000 &
# Note the process ID
echo "Server PID: $!"
# Stop when done
kill <PID>
Note: This skill does NOT require persistent/system service installation. Running the server temporarily when needed is the recommended approach.
This skill:
--bind 0.0.0.0 unless absolutely necessaryAfter starting the server, verify it works:
# Simple health check
curl http://localhost:8000/status
# Should return: {"status": "ok"}
# Take a screenshot (safe test)
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "screenshot"}' \
-o screenshot.json
# If successful, you'll get a JSON response with base64 image data
Port Already in Use:
# Check what's using port 8000
lsof -i :8000 # macOS/Linux
netstat -ano | findstr :8000 # Windows
# Solution: Use a different port
cua-server start --port 8001
Permission Denied (Linux):
# You may need to add your user to the input group for keyboard/mouse control
sudo usermod -a -G input $USER
# Log out and back in for changes to take effect
Display Not Found (Linux):
# Check your display variable
echo $DISPLAY
# Set it explicitly
DISPLAY=:0 cua-server start --port 8000
Server Not Responding:
# Check if the process is running
ps aux | grep cua-server # Linux/macOS
tasklist | findstr cua-server # Windows
# Try running in foreground to see errors
cua-server start --port 8000 --debug
Capture the current screen:
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "screenshot"}' \
| jq -r '.result.base64' \
| base64 -d > screenshot.png
Click at specific x,y coordinates:
# Click at center of 1280x720 screen
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "left_click", "params": {"x": 640, "y": 360}}'
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "right_click", "params": {"x": 640, "y": 360}}'
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "double_click", "params": {"x": 640, "y": 360}}'
Type text at the current cursor position:
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "type_text", "params": {"text": "Hello, World!"}}'
Press a key combination:
# Ctrl+C
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "hotkey", "params": {"keys": ["ctrl", "c"]}}'
# Ctrl+Alt+T (open terminal)
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "hotkey", "params": {"keys": ["ctrl", "alt", "t"]}}'
Press a single key:
# Press Enter
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "press_key", "params": {"key": "enter"}}'
# Press Escape
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "press_key", "params": {"key": "escape"}}'
Move cursor to specific position:
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "move_cursor", "params": {"x": 100, "y": 200}}'
Scroll up or down:
# Scroll down 3 units
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "scroll_direction", "params": {"direction": "down", "amount": 3}}'
# Scroll up 5 units
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "scroll_direction", "params": {"direction": "up", "amount": 5}}'
Launch an application by name:
# Launch Firefox
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "launch", "params": {"app": "firefox"}}'
# Launch Terminal
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "launch", "params": {"app": "xfce4-terminal"}}'
Open a file or URL with default application:
# Open URL
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "open", "params": {"path": "https://example.com"}}'
# Open file
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "open", "params": {"path": "/home/cua/document.txt"}}'
Get current window ID:
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "get_current_window_id"}'
Maximize window:
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "maximize_window", "params": {"window_id": "0x1234567"}}'
Minimize window:
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "minimize_window", "params": {"window_id": "0x1234567"}}'
Open Firefox and navigate to a website:
# Take initial screenshot
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "screenshot"}' -o initial.json
# Launch Firefox
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "launch", "params": {"app": "firefox"}}'
sleep 3
# Focus address bar (Ctrl+L)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "hotkey", "params": {"keys": ["ctrl", "l"]}}'
sleep 1
# Type URL
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "https://example.com"}}'
# Press Enter
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "enter"}}'
sleep 5
# Take final screenshot
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "screenshot"}' -o final.json
Open text editor and type content:
# Open terminal
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "hotkey", "params": {"keys": ["ctrl", "alt", "t"]}}'
sleep 2
# Type command to open text editor
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "mousepad"}}'
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "enter"}}'
sleep 2
# Type some text
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "Hello from OpenClaw!\nThis is automated desktop control."}}'
# Save file (Ctrl+S)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "hotkey", "params": {"keys": ["ctrl", "s"]}}'
sleep 1
# Type filename
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "openclaw-demo.txt"}}'
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "enter"}}'
Fill out a web form:
# Assuming browser is open with form visible
# Click on first input field (adjust coordinates)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "left_click", "params": {"x": 400, "y": 300}}'
# Type name
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "John Doe"}}'
# Tab to next field
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "tab"}}'
# Type email
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "john@example.com"}}'
# Tab to next field
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "press_key", "params": {"key": "tab"}}'
# Type message
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "type_text", "params": {"text": "This form was filled automatically by OpenClaw!"}}'
# Submit form (click submit button)
curl -X POST http://localhost:8000/cmd -H "Content-Type: application/json" -d '{"command": "left_click", "params": {"x": 400, "y": 500}}'
curl http://localhost:8000/status
curl http://localhost:8000/commands | jq
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "get_screen_size"}'
curl -X POST http://localhost:8000/cmd \
-H "Content-Type: application/json" \
-d '{"command": "get_cursor_position"}'
CUA_SERVER_URL: Base URL for CUA server (default: http://localhost:8000)sleep between commands to allow UI to updateOnce this skill is loaded, you can use it in OpenClaw conversations:
User: "Take a screenshot and open Firefox"
OpenClaw: *executes the screenshot and launch firefox commands*
User: "Type 'Hello World' in the current window"
OpenClaw: *executes the type_text command*
User: "Click at the center of the screen"
OpenClaw: *executes click command at 640,360*
curl http://localhost:8000/status