Desktop Control

Advanced desktop automation with mouse, keyboard, and screen control

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 232 · 33.9k · 296 current installs · 314 all-time installs

by@matagul

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description, SKILL.md, and the included Python code all describe and implement desktop automation (pyautogui-based mouse/keyboard control, screenshots, window management, clipboard). There are no unrelated environment variables, binaries, or install steps requested that would be inappropriate for this purpose.

✓

Instruction Scope

Runtime instructions and code operate within the expected scope: moving/clicking the mouse, typing keys, taking screenshots, finding images on screen, and reading/writing the clipboard. These actions are sensitive (can capture screen contents and clipboard) but are directly relevant to the stated functionality; I saw no instructions to read unrelated system files, environment variables, or to send data to external endpoints.

✓

Install Mechanism

No automatic install spec is included. SKILL.md instructs the user to pip install reasonable dependencies (pyautogui, pillow, opencv-python, pygetwindow, pyperclip) — conventional for this functionality. There are no downloads from untrusted URLs or extract/install steps in the skill metadata.

✓

Credentials

The skill does not request any environment variables, keys, or credentials. The operations (desktop control, screenshots, clipboard) do not require cloud credentials and none are declared, which is proportionate to its purpose.

ℹ

Persistence & Privilege

always is false and there are no install hooks that persist automatically. However, the package includes an autonomous AIDesktopAgent class and the platform default allows model invocation (disable-model-invocation=false). That means an agent could invoke this skill autonomously to control the desktop — a normal platform capability but one that increases risk because the skill can take screenshots and control input.

Assessment

This skill appears to be what it claims: powerful desktop automation using pyautogui. Before installing or running it, consider the following: - Understand the power: it can move your mouse, type, press hotkeys, capture screenshots, and read/modify the clipboard — all of which can expose sensitive data or cause actions on your machine. This is expected behavior for a desktop-automation skill, not a hidden backdoor. - Prefer running in a safe environment: test in a disposable VM, non-production account, or on a system without sensitive documents open. Close important apps before running demos. - Use safety options: enable failsafe (move mouse to corner to abort) and set require_approval=True if you want manual confirmation for each action. Review demos before running them. - Beware of autonomous invocation: if you allow the agent to invoke skills autonomously, it could run sequences without your interactive confirmation. If you do not trust the skill/user code, disable autonomous invocation or only allow manual/user-invoked runs. - Review the code: if you plan to run this long-term, inspect the full ai_agent.py and any truncated parts for network calls or code that might upload screenshots/clipboard data. The provided fragments show no network exfiltration, but the files were truncated in places — verify the remaining code sections yourself. - Install dependencies from official sources and avoid running unknown binaries. If you need to grant broader privileges (e.g., run as admin), reconsider usage. If you want, I can scan the remaining truncated portions (full ai_agent.py and __init__.py) for any network calls, hidden endpoints, or suspicious behaviors to raise confidence further.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk977ba9ex3zwfbe3pejrdvtd6180kxft

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Desktop Control Skill

The most advanced desktop automation skill for OpenClaw. Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.

🎯 Features

Mouse Control

✅ Absolute positioning - Move to exact coordinates
✅ Relative movement - Move from current position
✅ Smooth movement - Natural, human-like mouse paths
✅ Click types - Left, right, middle, double, triple clicks
✅ Drag & drop - Drag from point A to point B
✅ Scroll - Vertical and horizontal scrolling
✅ Position tracking - Get current mouse coordinates

Keyboard Control

✅ Text typing - Fast, accurate text input
✅ Hotkeys - Execute keyboard shortcuts (Ctrl+C, Win+R, etc.)
✅ Special keys - Enter, Tab, Escape, Arrow keys, F-keys
✅ Key combinations - Multi-key press combinations
✅ Hold & release - Manual key state control
✅ Typing speed - Configurable WPM (instant to human-like)

Screen Operations

✅ Screenshot - Capture entire screen or regions
✅ Image recognition - Find elements on screen (via OpenCV)
✅ Color detection - Get pixel colors at coordinates
✅ Multi-monitor - Support for multiple displays

Window Management

✅ Window list - Get all open windows
✅ Activate window - Bring window to front
✅ Window info - Get position, size, title
✅ Minimize/Maximize - Control window states

Safety Features

✅ Failsafe - Move mouse to corner to abort
✅ Pause control - Emergency stop mechanism
✅ Approval mode - Require confirmation for actions
✅ Bounds checking - Prevent out-of-screen operations
✅ Logging - Track all automation actions

🚀 Quick Start

Installation

First, install required dependencies:

pip install pyautogui pillow opencv-python pygetwindow

Basic Usage

from skills.desktop_control import DesktopController

# Initialize controller
dc = DesktopController(failsafe=True)

# Mouse operations
dc.move_mouse(500, 300)  # Move to coordinates
dc.click()  # Left click at current position
dc.click(100, 200, button="right")  # Right click at position

# Keyboard operations
dc.type_text("Hello from OpenClaw!")
dc.hotkey("ctrl", "c")  # Copy
dc.press("enter")

# Screen operations
screenshot = dc.screenshot()
position = dc.get_mouse_position()

📋 Complete API Reference

Mouse Functions

`move_mouse(x, y, duration=0, smooth=True)`

Move mouse to absolute screen coordinates.

Parameters:

x (int): X coordinate (pixels from left)
y (int): Y coordinate (pixels from top)
duration (float): Movement time in seconds (0 = instant, 0.5 = smooth)
smooth (bool): Use bezier curve for natural movement

Example:

# Instant movement
dc.move_mouse(1000, 500)

# Smooth 1-second movement
dc.move_mouse(1000, 500, duration=1.0)

`move_relative(x_offset, y_offset, duration=0)`

Move mouse relative to current position.

Parameters:

x_offset (int): Pixels to move horizontally (positive = right)
y_offset (int): Pixels to move vertically (positive = down)
duration (float): Movement time in seconds

Example:

# Move 100px right, 50px down
dc.move_relative(100, 50, duration=0.3)

`click(x=None, y=None, button='left', clicks=1, interval=0.1)`

Perform mouse click.

Parameters:

x, y (int, optional): Coordinates to click (None = current position)
button (str): 'left', 'right', 'middle'
clicks (int): Number of clicks (1 = single, 2 = double)
interval (float): Delay between multiple clicks

Example:

# Simple left click
dc.click()

# Double-click at specific position
dc.click(500, 300, clicks=2)

# Right-click
dc.click(button='right')

`drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')`

Drag and drop operation.

Parameters:

start_x, start_y (int): Starting coordinates
end_x, end_y (int): Ending coordinates
duration (float): Drag duration
button (str): Mouse button to use

Example:

# Drag file from desktop to folder
dc.drag(100, 100, 500, 500, duration=1.0)

`scroll(clicks, direction='vertical', x=None, y=None)`

Scroll mouse wheel.

Parameters:

clicks (int): Scroll amount (positive = up/left, negative = down/right)
direction (str): 'vertical' or 'horizontal'
x, y (int, optional): Position to scroll at

Example:

# Scroll down 5 clicks
dc.scroll(-5)

# Scroll up 10 clicks
dc.scroll(10)

# Horizontal scroll
dc.scroll(5, direction='horizontal')

`get_mouse_position()`

Get current mouse coordinates.

Returns: (x, y) tuple

Example:

x, y = dc.get_mouse_position()
print(f"Mouse is at: {x}, {y}")

Keyboard Functions

`type_text(text, interval=0, wpm=None)`

Type text with configurable speed.

Parameters:

text (str): Text to type
interval (float): Delay between keystrokes (0 = instant)
wpm (int, optional): Words per minute (overrides interval)

Example:

# Instant typing
dc.type_text("Hello World")

# Human-like typing at 60 WPM
dc.type_text("Hello World", wpm=60)

# Slow typing with 0.1s between keys
dc.type_text("Hello World", interval=0.1)

`press(key, presses=1, interval=0.1)`

Press and release a key.

Parameters:

key (str): Key name (see Key Names section)
presses (int): Number of times to press
interval (float): Delay between presses

Example:

# Press Enter
dc.press('enter')

# Press Space 3 times
dc.press('space', presses=3)

# Press Down arrow
dc.press('down')

`hotkey(*keys, interval=0.05)`

Execute keyboard shortcut.

Parameters:

*keys (str): Keys to press together
interval (float): Delay between key presses

Example:

# Copy (Ctrl+C)
dc.hotkey('ctrl', 'c')

# Paste (Ctrl+V)
dc.hotkey('ctrl', 'v')

# Open Run dialog (Win+R)
dc.hotkey('win', 'r')

# Save (Ctrl+S)
dc.hotkey('ctrl', 's')

# Select All (Ctrl+A)
dc.hotkey('ctrl', 'a')

`key_down(key)` / `key_up(key)`

Manually control key state.

Example:

# Hold Shift
dc.key_down('shift')
dc.type_text("hello")  # Types "HELLO"
dc.key_up('shift')

# Hold Ctrl and click (for multi-select)
dc.key_down('ctrl')
dc.click(100, 100)
dc.click(200, 100)
dc.key_up('ctrl')

Screen Functions

`screenshot(region=None, filename=None)`

Capture screen or region.

Parameters:

region (tuple, optional): (left, top, width, height) for partial capture
filename (str, optional): Path to save image

Returns: PIL Image object

Example:

# Full screen
img = dc.screenshot()

# Save to file
dc.screenshot(filename="screenshot.png")

# Capture specific region
img = dc.screenshot(region=(100, 100, 500, 300))

`get_pixel_color(x, y)`

Get color of pixel at coordinates.

Returns: RGB tuple (r, g, b)

Example:

r, g, b = dc.get_pixel_color(500, 300)
print(f"Color at (500, 300): RGB({r}, {g}, {b})")

`find_on_screen(image_path, confidence=0.8)`

Find image on screen (requires OpenCV).

Parameters:

image_path (str): Path to template image
confidence (float): Match threshold (0-1)

Returns: (x, y, width, height) or None

Example:

# Find button on screen
location = dc.find_on_screen("button.png")
if location:
    x, y, w, h = location
    # Click center of found image
    dc.click(x + w//2, y + h//2)

`get_screen_size()`

Get screen resolution.

Returns: (width, height) tuple

Example:

width, height = dc.get_screen_size()
print(f"Screen: {width}x{height}")

Window Functions

`get_all_windows()`

List all open windows.

Returns: List of window titles

Example:

windows = dc.get_all_windows()
for title in windows:
    print(f"Window: {title}")

`activate_window(title_substring)`

Bring window to front by title.

Parameters:

title_substring (str): Part of window title to match

Example:

# Activate Chrome
dc.activate_window("Chrome")

# Activate VS Code
dc.activate_window("Visual Studio Code")

`get_active_window()`

Get currently focused window.

Returns: Window title (str)

Example:

active = dc.get_active_window()
print(f"Active window: {active}")

Clipboard Functions

`copy_to_clipboard(text)`

Copy text to clipboard.

Example:

dc.copy_to_clipboard("Hello from OpenClaw!")

`get_from_clipboard()`

Get text from clipboard.

Returns: str

Example:

text = dc.get_from_clipboard()
print(f"Clipboard: {text}")

⌨️ Key Names Reference

Alphabet Keys

'a' through 'z'

Number Keys

'0' through '9'

Function Keys

'f1' through 'f24'

Special Keys

'enter' / 'return'
'esc' / 'escape'
'space' / 'spacebar'
'tab'
'backspace'
'delete' / 'del'
'insert'
'home'
'end'
'pageup' / 'pgup'
'pagedown' / 'pgdn'

Arrow Keys

'up' / 'down' / 'left' / 'right'

Modifier Keys

'ctrl' / 'control'
'shift'
'alt'
'win' / 'winleft' / 'winright'
'cmd' / 'command' (Mac)

Lock Keys

'capslock'
'numlock'
'scrolllock'

Punctuation

'.' / ',' / '?' / '!' / ';' / ':'
'[' / ']' / '{' / '}'
'(' / ')'
'+' / '-' / '*' / '/' / '='

🛡️ Safety Features

Failsafe Mode

Move mouse to any corner of the screen to abort all automation.

# Enable failsafe (enabled by default)
dc = DesktopController(failsafe=True)

Pause Control

# Pause all automation for 2 seconds
dc.pause(2.0)

# Check if automation is safe to proceed
if dc.is_safe():
    dc.click(500, 500)

Approval Mode

Require user confirmation before actions:

dc = DesktopController(require_approval=True)

# This will ask for confirmation
dc.click(500, 500)  # Prompt: "Allow click at (500, 500)? [y/n]"

🎨 Advanced Examples

Example 1: Automated Form Filling

dc = DesktopController()

# Click name field
dc.click(300, 200)
dc.type_text("John Doe", wpm=80)

# Tab to next field
dc.press('tab')
dc.type_text("john@example.com", wpm=80)

# Tab to password
dc.press('tab')
dc.type_text("SecurePassword123", wpm=60)

# Submit form
dc.press('enter')

Example 2: Screenshot Region and Save

# Capture specific area
region = (100, 100, 800, 600)  # left, top, width, height
img = dc.screenshot(region=region)

# Save with timestamp
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
img.save(f"capture_{timestamp}.png")

Example 3: Multi-File Selection

# Hold Ctrl and click multiple files
dc.key_down('ctrl')
dc.click(100, 200)  # First file
dc.click(100, 250)  # Second file
dc.click(100, 300)  # Third file
dc.key_up('ctrl')

# Copy selected files
dc.hotkey('ctrl', 'c')

Example 4: Window Automation

# Activate Calculator
dc.activate_window("Calculator")
time.sleep(0.5)

# Type calculation
dc.type_text("5+3=", interval=0.2)
time.sleep(0.5)

# Take screenshot of result
dc.screenshot(filename="calculation_result.png")

Example 5: Drag & Drop File

# Drag file from source to destination
dc.drag(
    start_x=200, start_y=300,  # File location
    end_x=800, end_y=500,       # Folder location
    duration=1.0                 # Smooth 1-second drag
)

⚡ Performance Tips

Use instant movements for speed: duration=0
Batch operations instead of individual calls
Cache screen positions instead of recalculating
Disable failsafe for maximum performance (use with caution)
Use hotkeys instead of menu navigation

⚠️ Important Notes

Screen coordinates start at (0, 0) in top-left corner
Multi-monitor setups may have negative coordinates for secondary displays
Windows DPI scaling may affect coordinate accuracy
Failsafe corners are: (0,0), (width-1, 0), (0, height-1), (width-1, height-1)
Some applications may block simulated input (games, secure apps)

🔧 Troubleshooting

Mouse not moving to correct position

Check DPI scaling settings
Verify screen resolution matches expectations
Use get_screen_size() to confirm dimensions

Keyboard input not working

Ensure target application has focus
Some apps require admin privileges
Try increasing interval for reliability

Failsafe triggering accidentally

Increase screen border tolerance
Move mouse away from corners during normal use
Disable if needed: DesktopController(failsafe=False)

Permission errors

Run Python with administrator privileges for some operations
Some secure applications block automation

📦 Dependencies

PyAutoGUI - Core automation engine
Pillow - Image processing
OpenCV (optional) - Image recognition
PyGetWindow - Window management

Install all:

pip install pyautogui pillow opencv-python pygetwindow

Built for OpenClaw - The ultimate desktop automation companion 🦞

Files

6 total

Select a file

Select a file to preview.

Comments

Loading comments…

Desktop Control

License

SKILL.md

Desktop Control Skill

🎯 Features

Mouse Control

Keyboard Control

Screen Operations

Window Management

Safety Features

🚀 Quick Start

Installation

Basic Usage

📋 Complete API Reference

Mouse Functions

move_mouse(x, y, duration=0, smooth=True)

move_relative(x_offset, y_offset, duration=0)

click(x=None, y=None, button='left', clicks=1, interval=0.1)

drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')

scroll(clicks, direction='vertical', x=None, y=None)

get_mouse_position()

Keyboard Functions

type_text(text, interval=0, wpm=None)

press(key, presses=1, interval=0.1)

hotkey(*keys, interval=0.05)

key_down(key) / key_up(key)

Screen Functions

screenshot(region=None, filename=None)

get_pixel_color(x, y)

find_on_screen(image_path, confidence=0.8)

get_screen_size()

Window Functions

get_all_windows()

activate_window(title_substring)

get_active_window()

Clipboard Functions

copy_to_clipboard(text)

get_from_clipboard()

⌨️ Key Names Reference

Alphabet Keys

Number Keys

Function Keys

Special Keys

Arrow Keys

Modifier Keys

Lock Keys

Punctuation

🛡️ Safety Features

Failsafe Mode

Pause Control

Approval Mode

🎨 Advanced Examples

Example 1: Automated Form Filling

Example 2: Screenshot Region and Save

Example 3: Multi-File Selection

Example 4: Window Automation

Example 5: Drag & Drop File

⚡ Performance Tips

⚠️ Important Notes

🔧 Troubleshooting

Mouse not moving to correct position

Keyboard input not working

Failsafe triggering accidentally

Permission errors

📦 Dependencies

Files

Comments

`move_mouse(x, y, duration=0, smooth=True)`

`move_relative(x_offset, y_offset, duration=0)`

`click(x=None, y=None, button='left', clicks=1, interval=0.1)`

`drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')`

`scroll(clicks, direction='vertical', x=None, y=None)`

`get_mouse_position()`

`type_text(text, interval=0, wpm=None)`

`press(key, presses=1, interval=0.1)`

`hotkey(*keys, interval=0.05)`

`key_down(key)` / `key_up(key)`

`screenshot(region=None, filename=None)`

`get_pixel_color(x, y)`

`find_on_screen(image_path, confidence=0.8)`

`get_screen_size()`

`get_all_windows()`

`activate_window(title_substring)`

`get_active_window()`

`copy_to_clipboard(text)`

`get_from_clipboard()`