Install
openclaw skills install mano-cuaComputer use for GUI automation tasks via VLA models. Use when the user describes a task in natural language that requires visual screen interaction and no API or CLI exists for the target app.
openclaw skills install mano-cuaDesktop GUI automation for tasks via VLA models. Use when the user describes a task in natural language that requires visual screen interaction and no API or CLI exists for the target app. Supports fully on-device local mode and cloud mode.
mano-cua binary installedmacOS / Linux (Homebrew):
brew install Mininglamp-AI/tap/mano-cua
Windows:
Download the latest mano-cua-windows.zip from GitHub Releases, extract it, and add the folder to your PATH.
# Run a task
mano-cua run "your task description"
# Run with options(minimize UI panel and set max steps)
mano-cua run "task" --minimize --max-steps 10
# Open a URL in the browser before starting the task
mano-cua run "task" --url "https://example.com"
# Open an app before starting the task (use the macOS app name, e.g. 'Notes', 'Safari', 'Google Chrome')
mano-cua run "task" --app "Notes"
# Run in local mode (on-device inference, macOS Apple Silicon only)
mano-cua run "task" --local
# Stop the current running task
mano-cua stop
Run mano-cua --help or mano-cua <command> --help for full flags and options.
Note: Only one task can run at a time per device. If you need to start a new task, first stop the current one with
mano-cua stop.
--app vs --url: Use one or the other, not both.
--applaunches a desktop application by its macOS name (as shown in Spotlight search).--urlopens a URL in the default browser. Both bring the target to the foreground before the agent starts.
Tip for local mode: Write task descriptions with explicit step-by-step instructions for best results. For example, instead of "search for iphone on Xiaohongshu", write "click the search box at the top, type iphone, click the search button, then click the first result". Explicit steps significantly improve local model accuracy.
Runs Mano-P entirely on-device via MLX. No data leaves the machine. Requires macOS with Apple Silicon (M1+). Highly recommended to add --url or --app arg when using local mode to improve efficiency and accuracy. Without --local, the tool uses cloud inference.
Setup:
mano-cua check
mano-cua install-sdk
mano-cua install-model
Run:
mano-cua run "click the search box, type openai, click search, click the first result to open OpenAI homepage" --local --url "https://www.google.com"
mano-cua run "click the search box, type iphone, click the search button, open the first post" --local --url "https://www.xiaohongshu.com" --minimize --max-steps 15
mano-cua run "create a new note and type hello world" --local --app "Notes"
# Local mode (recommended for privacy — all inference on-device, no data leaves the machine)
mano-cua run "click the search box, type openai, click search, click the first result" --local --url "https://www.google.com" --minimize
mano-cua run "create a new note and type hello world" --local --app "Notes"
# Cloud mode
mano-cua run "Open Notes and create a new note titled Meeting Summary"
mano-cua run "Search for AI news in the browser and show the first result" --minimize --max-steps 20
# Cloud mode with --app or --url
mano-cua run "Create a calendar event for Friday 20:00 named Team Meeting" --app "Microsoft Outlook"
mano-cua run "Compare available plans for the AeroAPI" --url "https://www.flightaware.com/"
# Stop the current task (use before starting a new one)
mano-cua stop
At each step, the current screen state is analyzed by a hybrid vision model to decide the next action. The agent performs bounded GUI actions (click, type, scroll, drag) only within the user-specified task scope, visible foreground target, and configured step/session limits. For sensitive or irreversible actions, the agent pauses and prompts the user for explicit confirmation before proceeding.
Hybrid vision model:
The system automatically selects the appropriate model based on task complexity.
In local mode (--local), a local Mano-P model runs on-device via MLX. No network calls for inference.
Structural capability boundaries (what the tool cannot do):
A small UI panel is displayed on the top-right corner of the screen to track and manage the current session status.
--max-steps, preventing runaway execution.mano-cua stop.--local) runs inference entirely on-device with zero network calls — no data ever leaves the machine.--app or --url is specified, the agent's interaction is focused on that specific application or webpage.task_model.py) for easy auditing.~/.myapp_device_id) — no secrets are transmitted or stored remotely.macOS is the preferred and most tested platform. Adaptations for Windows and Linux are not yet fully completed — minor issues are expected.