OpenClaw 11-in-1 Visual Automation Suite (Windows Only) Complete visual automation toolkit with 11 integrated modules. ### 💰 Price One-time purchase: **$2.99** (Lifetime access to all modules + future updates) ### 🚀 How to Purchase 1. Pay via PayPal Invoice: 🔗 [Click to pay $2.99](https://www.paypal.com/invoice/p/#V2RC9S8LVKJ434R9) 2. After payment, send your email to: **1215066513@qq.com** 3. I will send the full download link within 12 hours. ### 🖥️ Compatibility - Windows 10 / 11 only - Not compatible with macOS / Linux ## 1. Product Basic Description ### 1.1 Core Functions Provides professional universal computer vision automation capabilities covering the full-process visual automation scenarios such as environment initialization, full-screen automatic screenshot, OCR text recognition, template matching target localization, mouse click simulation, keyboard input simulation, and complete environment initialization & cleanup mechanisms. It supports custom task combination and cyclic execution. ### 1.2 Version & Directory Description - Core Capability: Flexible invocation based on minimum executable units, supporting parameter customization, result variable inheritance, and custom skill saving. All functions can be used directly with the `call` command right after extracting the package. - Directory Structure: - `claw.json` - Skill package configuration file - `skills/all_skills.claw` - All skill unit definitions - `templates/` - Directory for template images (place your template images here for matching) - Temporary file directory `temp/` (for storing screenshots like temp/screen.png) is automatically created after executing `init_env`; temporary screenshot files can be cleaned up via `clean_temp`. - Version Info: Current version: 1.0.0; Compatible with OpenClaw >= 1.0.0 ### 1.3 Paid Attribute This automation skill system (vision-auto-tool-pro) is a paid professional toolkit. The document does not explicitly authorize commercial use of the toolkit. The paid permission only covers basic usage (non-commercial by default), and commercial use requires separate confirmation of authorization with the provider (e.g., purchasing a commercial license, signing a commercial agreement). ## 2. Complete Skill Invocation Manual ### Important Notes Ensure sufficient time is reserved for the computer to respond to each click or operation. For example, add a 2-second wait after `mouse_click` to avoid operation failure due to slow system response. ### 2.1 List of All Minimum Executable Units | Unit Name | Fixed Call Name | Function Description | Individual Call Method | |-------------------------|--------------------------|--------------------------------------------------------------------------------------|-------------------------------------------------| | Initialize Environment | `init_env` | Create directory structure, clear temporary files, check template directory | `call init_env` | | Full Screen Screenshot | `screenshot_full` | Capture entire screen and save as temp/screen.png | `call screenshot_full` | | Check Screenshot Validity | `check_screenshot_valid` | Check for black screen/freeze, wake up the interface if invalid | `call check_screenshot_valid` | | Wake Interface | `wake_window` | Solve the problems of background non-rendering and black screenshot | `call wake_window` | | OCR Recognition | `ocr_recognize` | Recognize all text on the screen and their corresponding coordinates | `call ocr_recognize` | | Template Matching | `template_match` | Use template image to match and locate icons/buttons | `call template_match category template_name` | | Unified Localization | `locate_target` | Prioritize OCR positioning; use template matching if not found, return coordinates | `call locate_target target_text OR category+template_name` | | Mouse Click | `mouse_click` | Move to the specified coordinates and perform click operation | `call mouse_click X Y [click_type, default=single_click]` | | Keyboard Input | `keyboard_input` | Input text after locating the input box | `call keyboard_input target_coords/description input_content` | | Clean Temporary Files | `clean_temp` | Delete temporary screenshots and free up storage space | `call clean_temp` | | Loop Restart | `loop_restart` | Wait 2 seconds then go back to the screenshot step and restart the process | `call loop_restart` | ### 2.2 Method for Invoking Individual Units #### Invocation Format ``` call [unit_call_name] [parameter...] ``` #### Invocation Examples - Initialize environment: `call init_env` - Template match browser icon on desktop: `call template_match desktop web` - Perform double-click at coordinates (100,200): `call mouse_click 100 200 double` ### 2.3 Combine into Custom New Tasks By writing one call instruction per line in execution order, you can combine them into a custom new task, which supports variable inheritance, looping, and permanent saving. #### Format Example (Open Browser) ``` # Task Name: Open Browser call init_env call screenshot_full call check_screenshot_valid call locate_target browser desktop Browser call mouse_click {{resultX}} {{resultY}} double call clean_temp ``` #### Combination Steps 1. **Write task name and description first** (for easier identification later) 2. **In execution order**, write one `call unit_name parameters` instruction per line 3. Coordinates can use variables `{{resultX}}`/`{{resultY}}` to inherit the output result of the previous unit 4. If cyclic execution is required, add `call loop_restart` at the end 5. **Save custom skill**: Use `save_skill skill_name instruction_list` to save the task permanently, then call it directly with `call skill_name` ### 2.4 Complete Main Flow Invocation Example ``` # General Main Flow: vision_auto_main call init_env call screenshot_full call check_screenshot_valid call ocr_recognize # If template matching is needed, add this line: call template_match category name call locate_target target_text call mouse_click {{X}} {{Y}} # If text input is needed, replace the above line with: call keyboard_input {{X}} {{Y}} input_content call clean_temp # Add this line if you need to loop: call loop_restart ``` ### Important Notes Ensure sufficient time is reserved for the computer to respond to each click or operation. > For example, add a 2-second wait after `mouse_click` to avoid operation failure due to slow system response.

Security checks across malware telemetry and agentic risk

Overview

This skill appears to enable powerful screen-reading and device-control automation, but its artifacts do not sufficiently bound or disclose the privacy risks of capturing and storing screen contents.

Review this skill carefully before installing. Only use it in sessions where you are comfortable with full-screen screenshots and OCR text being captured, avoid displaying passwords or private documents while it runs, and delete any saved screenshot/OCR files after use. Prefer a version that requires explicit confirmation before capture or input control and that stores sensitive data only temporarily.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Findings (3)

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill explicitly supports full-screen capture, OCR extraction of on-screen text, mouse/keyboard control, and optional looped monitoring, but it provides no user-consent, privacy, or safety guardrails. In this context, those capabilities can easily expose sensitive on-screen data or enable persistent unattended control of the host, making misuse materially more dangerous than ordinary automation documentation.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The function captures the full screen and writes the image to disk in a predictable temp location without any consent, minimization, or retention controls. Screenshots can contain credentials, personal messages, tokens, financial data, or other sensitive on-screen content, and persisting them increases exposure to local users, other processes, backups, and later forensic recovery.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The OCR routine extracts all recognized on-screen text and writes it to a plain-text file, which can persist highly sensitive data such as passwords, chats, emails, API keys, or documents visible on screen. This broad capture of textual content creates a secondary data store that is easier to search, copy, and exfiltrate than the original image.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal