{"skill":{"slug":"windows-control","displayName":"Windows Control","summary":"Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.","description":"---\nname: windows-control\ndescription: Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.\n---\n\n# Windows Control Skill\n\nFull desktop automation for Windows. Control mouse, keyboard, and screen like a human user.\n\n## Quick Start\n\nAll scripts are in `skills/windows-control/scripts/`\n\n### Screenshot\n```bash\npy screenshot.py > output.b64\n```\nReturns base64 PNG of entire screen.\n\n### Click\n```bash\npy click.py 500 300              # Left click at (500, 300)\npy click.py 500 300 right        # Right click\npy click.py 500 300 left 2       # Double click\n```\n\n### Type Text\n```bash\npy type_text.py \"Hello World\"\n```\nTypes text at current cursor position (10ms between keys).\n\n### Press Keys\n```bash\npy key_press.py \"enter\"\npy key_press.py \"ctrl+s\"\npy key_press.py \"alt+tab\"\npy key_press.py \"ctrl+shift+esc\"\n```\n\n### Move Mouse\n```bash\npy mouse_move.py 500 300\n```\nMoves mouse to coordinates (smooth 0.2s animation).\n\n### Scroll\n```bash\npy scroll.py up 5      # Scroll up 5 notches\npy scroll.py down 10   # Scroll down 10 notches\n```\n\n### Window Management (NEW!)\n```bash\npy focus_window.py \"Chrome\"           # Bring window to front\npy minimize_window.py \"Notepad\"       # Minimize window\npy maximize_window.py \"VS Code\"       # Maximize window\npy close_window.py \"Calculator\"       # Close window\npy get_active_window.py               # Get title of active window\n```\n\n### Advanced Actions (NEW!)\n```bash\n# Click by text (No coordinates needed!)\npy click_text.py \"Save\"               # Click \"Save\" button anywhere\npy click_text.py \"Submit\" \"Chrome\"    # Click \"Submit\" in Chrome only\n\n# Drag and Drop\npy drag.py 100 100 500 300            # Drag from (100,100) to (500,300)\n\n# Robust Automation (Wait/Find)\npy wait_for_text.py \"Ready\" \"App\" 30  # Wait up to 30s for text\npy wait_for_window.py \"Notepad\" 10    # Wait for window to appear\npy find_text.py \"Login\" \"Chrome\"      # Get coordinates of text\npy list_windows.py                    # List all open windows\n```\n\n### Read Window Text\n```bash\npy read_window.py \"Notepad\"           # Read all text from Notepad\npy read_window.py \"Visual Studio\"     # Read text from VS Code\npy read_window.py \"Chrome\"            # Read text from browser\n```\nUses Windows UI Automation to extract actual text (not OCR). Much faster and more accurate than screenshots!\n\n### Read UI Elements (NEW!)\n```bash\npy read_ui_elements.py \"Chrome\"               # All interactive elements\npy read_ui_elements.py \"Chrome\" --buttons-only  # Just buttons\npy read_ui_elements.py \"Chrome\" --links-only    # Just links\npy read_ui_elements.py \"Chrome\" --json          # JSON output\n```\nReturns buttons, links, tabs, checkboxes, dropdowns with coordinates for clicking.\n\n### Read Webpage Content (NEW!)\n```bash\npy read_webpage.py                     # Read active browser\npy read_webpage.py \"Chrome\"            # Target Chrome specifically\npy read_webpage.py \"Chrome\" --buttons  # Include buttons\npy read_webpage.py \"Chrome\" --links    # Include links with coords\npy read_webpage.py \"Chrome\" --full     # All elements (inputs, images)\npy read_webpage.py \"Chrome\" --json     # JSON output\n```\nEnhanced browser content extraction with headings, text, buttons, and links.\n\n### Handle Dialogs (NEW!)\n```bash\n# List all open dialogs\npy handle_dialog.py list\n\n# Read current dialog content\npy handle_dialog.py read\npy handle_dialog.py read --json\n\n# Click button in dialog\npy handle_dialog.py click \"OK\"\npy handle_dialog.py click \"Save\"\npy handle_dialog.py click \"Yes\"\n\n# Type into dialog text field\npy handle_dialog.py type \"myfile.txt\"\npy handle_dialog.py type \"C:\\path\\to\\file\" --field 0\n\n# Dismiss dialog (auto-finds OK/Close/Cancel)\npy handle_dialog.py dismiss\n\n# Wait for dialog to appear\npy handle_dialog.py wait --timeout 10\npy handle_dialog.py wait \"Save As\" --timeout 5\n```\nHandles Save/Open dialogs, message boxes, alerts, confirmations, etc.\n\n### Click Element by Name (NEW!)\n```bash\npy click_element.py \"Save\"                    # Click \"Save\" anywhere\npy click_element.py \"OK\" --window \"Notepad\"   # In specific window\npy click_element.py \"Submit\" --type Button    # Only buttons\npy click_element.py \"File\" --type MenuItem    # Menu items\npy click_element.py --list                    # List clickable elements\npy click_element.py --list --window \"Chrome\"  # List in specific window\n```\nClick buttons, links, menu items by name without needing coordinates.\n\n### Read Screen Region (OCR - Optional)\n```bash\npy read_region.py 100 100 500 300     # Read text from coordinates\n```\nNote: Requires Tesseract OCR installation. Use read_window.py instead for better results.\n\n## Workflow Pattern\n\n1. **Read window** - Extract text from specific window (fast, accurate)\n2. **Read UI elements** - Get buttons, links with coordinates\n3. **Screenshot** (if needed) - See visual layout\n4. **Act** - Click element by name or coordinates\n5. **Handle dialogs** - Interact with popups/save dialogs\n6. **Read window** - Verify changes\n\n## Screen Coordinates\n\n- Origin (0, 0) is top-left corner\n- Your screen: 2560x1440 (check with screenshot)\n- Use coordinates from screenshot analysis\n\n## Examples\n\n### Open Notepad and type\n```bash\n# Press Windows key\npy key_press.py \"win\"\n\n# Type \"notepad\"\npy type_text.py \"notepad\"\n\n# Press Enter\npy key_press.py \"enter\"\n\n# Wait a moment, then type\npy type_text.py \"Hello from AI!\"\n\n# Save\npy key_press.py \"ctrl+s\"\n```\n\n### Click in VS Code\n```bash\n# Read current VS Code content\npy read_window.py \"Visual Studio Code\"\n\n# Click at specific location (e.g., file explorer)\npy click.py 50 100\n\n# Type filename\npy type_text.py \"test.js\"\n\n# Press Enter\npy key_press.py \"enter\"\n\n# Verify new file opened\npy read_window.py \"Visual Studio Code\"\n```\n\n### Monitor Notepad changes\n```bash\n# Read current content\npy read_window.py \"Notepad\"\n\n# User types something...\n\n# Read updated content (no screenshot needed!)\npy read_window.py \"Notepad\"\n```\n\n## Text Reading Methods\n\n**Method 1: Windows UI Automation (BEST)**\n- Use `read_window.py` for any window\n- Use `read_ui_elements.py` for buttons/links with coordinates\n- Use `read_webpage.py` for browser content with structure\n- Gets actual text data (not image-based)\n\n**Method 2: Click by Name (NEW)**\n- Use `click_element.py` to click buttons/links by name\n- No coordinates needed - finds elements automatically\n- Works across all windows or target specific window\n\n**Method 3: Dialog Handling (NEW)**\n- Use `handle_dialog.py` for popups, save dialogs, alerts\n- Read dialog content, click buttons, type text\n- Auto-dismiss with common buttons (OK, Cancel, etc.)\n\n**Method 4: Screenshot + Vision (Fallback)**\n- Take full screenshot\n- AI reads text visually\n- Slower but works for any content\n\n**Method 5: OCR (Optional)**\n- Use `read_region.py` with Tesseract\n- Requires additional installation\n- Good for images/PDFs with text\n\n## Safety Features\n\n- `pyautogui.FAILSAFE = True` (move mouse to top-left to abort)\n- Small delays between actions\n- Smooth mouse movements (not instant jumps)\n\n## Requirements\n\n- Python 3.11+\n- pyautogui (installed ✅)\n- pillow (installed ✅)\n\n## Tips\n\n- Always screenshot first to see current state\n- Coordinates are absolute (not relative to windows)\n- Wait briefly after clicks for UI to update\n- Use `ctrl+z` friendly actions when possible\n\n---\n\n**Status:** ✅ READY FOR USE (v2.0 - Dialog & UI Elements)\n**Created:** 2026-02-01\n**Updated:** 2026-02-02\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":8924,"installsAllTime":48,"installsCurrent":48,"stars":29,"versions":1},"createdAt":1769988173399,"updatedAt":1779076535458},"latestVersion":{"version":"1.0.0","createdAt":1769988173399,"changelog":"**Major update: Adds full desktop automation with robust window, UI, and dialog control.**\n\n- NEW: Control mouse, keyboard, screenshots, and interact with any Windows application via scripts.\n- NEW: Comprehensive window management (focus, minimize, maximize, close, get active window).\n- NEW: Advanced UI automation: click buttons/links by name, read UI elements, robust dialog handling.\n- NEW: Read actual window and browser text using Windows UI Automation (not OCR).\n- NEW: Extract and interact with webpage content, including buttons, links, and structure.\n- Enhanced automation reliability with wait/find routines and smooth mouse movement.\n- Safety features: failsafe mouse-abort, small delays, and user-friendly workflow documentation.","license":null},"metadata":null,"owner":{"handle":"spliff7777","userId":"s1753mker1xw3jj8ytc6kbrc85884yn9","displayName":"Spliff7777","image":"https://avatars.githubusercontent.com/u/211625539?v=4"},"moderation":{"isSuspicious":true,"isMalwareBlocked":false,"verdict":"suspicious","reasonCodes":["suspicious.llm_suspicious"],"summary":"Detected: suspicious.llm_suspicious","engineVersion":"v2.4.24","updatedAt":1779076535458}}