Install
openclaw skills install agentkvmControl physical devices (phones, PCs, Macs) through NanoKVM-USB hardware. Use this skill whenever the user asks you to interact with a physical screen, take screenshots of a connected device, click/type/scroll on a remote machine, automate a GUI workflow on real hardware, or implement a "computer use" loop that observes a screen and takes actions. Also trigger when you see AgentKVM in the project, references to NanoKVM-USB, or when the user says things like "click the button on my phone", "what's on the screen", "open Settings on the device", "type my password on the PC", or "automate this on the connected machine".
openclaw skills install agentkvmBefore using AgentKVM, ensure the following are installed and available:
npm install -g agentkvmbrew install ffmpeg on macOS, apt install ffmpeg on Linux)Run agentkvm status to verify everything is set up correctly. If the CLI is not found, install it first. If the device is not detected, check agentkvm list for available serial ports.
AgentKVM lets you see and operate physical devices (iPhones, Android phones, PCs, Macs, Linux machines) connected via NanoKVM-USB hardware. You take screenshots to observe the screen, then send mouse clicks, keyboard input, and scrolls to interact — just like a human sitting in front of the device.
Every interaction with a physical device follows the same pattern:
Screenshot → Analyze → Act → Verify
This loop is your fundamental building block. Chain multiple iterations to accomplish complex tasks.
agentkvm --json status
If this fails, the device isn't connected. Check the serial port with agentkvm list.
agentkvm --json screenshot
Returns { "path": "/path/to/screenshot.png", ... }. Read the image to see what's on screen.
# Click at pixel coordinates (relative to the cropped image)
agentkvm mouse click 223 485
# Type text
agentkvm type "hello world"
# Press key combos
agentkvm key enter
agentkvm key ctrl+c
agentkvm key cmd+space
# Scroll (positive = up, negative = down)
agentkvm mouse scroll 300 500 --delta -3
# Drag from point A to point B
agentkvm mouse drag 100 200 400 600
If AgentKVM is running on another machine, all commands work identically with --remote:
agentkvm --remote http://192.168.1.100:7070 --token my-secret screenshot --json
agentkvm --remote http://192.168.1.100:7070 --token my-secret mouse click 223 485
Or use the HTTP API directly — see references/api.md.
This is critical to get right. When you analyze a screenshot and identify a UI element at pixel (x, y), those coordinates are relative to the screenshot image itself — top-left is (0, 0). Pass these coordinates directly to agentkvm mouse click x y.
AgentKVM handles the translation to the actual hardware coordinates internally, based on the device type and crop settings. You don't need to do any math.
The device type determines how coordinates are translated:
"device" mode (iPhone, Android) — The cropped region IS the device's full screen. HID absolute coordinates 0–4096 map to the device's own display. Use this when the HDMI output shows the device screen within a larger capture frame.
"frame" mode (PC, Mac, Linux) — The cropped region is just a visual focus area; HID coordinates still map to the full monitor. Use this when you're controlling a computer where the capture resolution matches the target display.
The mode is selected automatically from the config. You rarely need to think about it.
When asked to perform a GUI task (e.g., "open Safari and search for X"):
Always start with a screenshot. Never assume what's on screen.
agentkvm --json screenshot
Read the returned image file. Describe what you see — this grounds your actions in reality.
Break the task into individual interactions. For "open Safari and search for X":
After each significant action, take a screenshot to verify it worked. Screens can be slow to update, so add brief waits between actions when needed (use sleep in your script).
Common pattern in a bash script:
# Click Safari icon at the observed position
agentkvm mouse click 223 950
sleep 1
# Verify it opened
agentkvm --json screenshot
# (read and analyze the screenshot)
# Click address bar
agentkvm mouse click 300 50
sleep 0.3
# Type search query
agentkvm type "weather today"
agentkvm key enter
sleep 2
# Verify search results loaded
agentkvm --json screenshot
If an action didn't produce the expected result:
All settings live in ~/.config/agentkvm/config.json. A typical setup:
{
"serialPort": "/dev/tty.usbserial-2140",
"resolution": { "width": 1920, "height": 1080 },
"videoDevice": "USB3 Video",
"deviceType": "iphone",
"crop": { "x": 738, "y": 55, "width": 447, "height": 970 }
}
Key fields:
serialPort — path to the NanoKVM-USB serial deviceresolution — HDMI capture resolutionvideoDevice — video capture device name or indexdeviceType — determines coordinate mode (iphone/android = device, pc/mac/linux = frame)crop — sub-region of the capture frame to use as the working areaWhen config is set, you can run bare commands without flags: agentkvm screenshot, agentkvm mouse click 100 200, etc.
Prefer clicking on text labels over icons — text is easier to locate precisely in screenshots.
Use --json for programmatic access — all commands support it and return structured data you can parse.
Double-click when single-click doesn't respond — some UI elements need --double.
Scroll in small increments — --delta 1 or --delta -1 is one scroll step. Use multiple steps with verification screenshots in between.
Type slowly for unreliable connections — increase --delay (default 50ms) if characters get dropped.
Use key combos for navigation — cmd+space (Spotlight), alt+tab (window switch), ctrl+c (cancel) are often faster than finding and clicking UI elements.
For the full CLI reference, key combo syntax, and HTTP API details, see references/api.md.