Install
openclaw skills install eye2byteGive your agent eyes — capture screenshots, voice, and annotations from any screen, monitor, or device via MCP.
openclaw skills install eye2byteEye2byte is an open-source MCP server (GitHub, PyPI) that lets you see the user's screen. Use these MCP tools only when the user explicitly asks you to look at something, debug a visual issue, or capture their screen.
~/.eye2byte/output/ on the user's machine. Nothing is sent to external servers (except the vision model API the user configured).--token flag sets a bearer token stored only in the user's openclaw.json. Treat it like any API secret. The token is never logged or transmitted beyond the Authorization header.capture_and_summarizeScreenshot the user's screen and get a structured analysis.
Parameters:
mode — "full" (default), "window", or "region"monitor — 0 = active monitor (default), 1/2/3 = specific monitor, -1 = ALL monitors at oncedelay — seconds to wait before capturing (useful for menus/tooltips)window_name — capture a specific app window by name (e.g., "chrome", "code")Use this when the user says things like "look at my screen", "what do you see", "debug this", or "what's wrong here".
capture_with_voiceScreenshot + voice recording + transcription. Returns both visual analysis and what the user said.
Use when the user wants to describe something verbally while showing their screen.
record_clip_and_summarizeRecord a short screen clip, extract keyframes, and analyze the sequence.
Use when the user wants to show you something that changes over time (animations, workflows, step sequences).
summarize_screenshotAnalyze an existing image file. Pass a file path to get a structured analysis.
transcribe_audioLocal Whisper transcription of any audio file.
get_recent_contextRetrieve recent Context Pack summaries from previous captures.
Use this to recall what you've seen recently without re-capturing.
Every capture returns a structured Context Pack:
Goal — what the user appears to be doing
Environment — OS, editor, repo, branch, language
Screen State — visible panels, files, terminal output
Signals — verbatim errors, stack traces, warnings
Likely Situation — what's probably happening
Suggested Next Info — what you should ask or do next
monitor=-1 captures ALL monitors stitched together — useful for seeing the full workspacemonitor=1, 2, 3 for targeting specific displaysmonitor=0) captures whichever monitor has the active windowEye2byte must be running on the machine whose screen you want to capture:
Local (same machine): Already configured if this skill loaded.
Remote (different machine): The user runs eye2byte-mcp --sse --token <secret> on their local machine, and configures the MCP connection URL in openclaw.json.
uv tool install eye2byte