SuperBased

v2.0.2

Eyes AND hands for OpenClaw — capture, AI vision, OCR, recording, voice dictation, and full GUI automation via 72 MCP tools. Use when the agent needs to see...

⭐ 0· 34·0 current·0 all-time

by@marmutapp

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for marmutapp/superbased.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "SuperBased" (marmutapp/superbased) from ClawHub.
Skill page: https://clawhub.ai/marmutapp/superbased
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install superbased

ClawHub CLI

Package manager switcher

npx clawhub@latest install superbased

Security Scan

Capability signals

CryptoRequires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description match the runtime instructions: the bundle is an instruction-only guide that tells OpenClaw agents when to call a separate SuperBased MCP server (installed via 'npm install -g superbased' and registered via 'openclaw mcp set'). The 72 MCP tools and sub-skill topics described align with the declared purpose (screen capture, OCR, voice, GUI automation).

Instruction Scope

The SKILL.md tells the agent to capture screens, read clipboard images, record audio, access window trees, and drive the user's desktop (click/type/drag/sequence). It also includes explicit guidance to batch click sequences and pick humanization profiles to avoid anti-bot detection and to automatically solve in-flow CAPTCHAs. Those instructions expand the agent's scope into actions that can bypass protections and access potentially sensitive local data. The bundle references user files/paths (e.g. ~/.superbased/ PID file) and hardware (microphone) even though no config paths or environment credentials are declared.

ℹ

Install Mechanism

The skill bundle itself has no install spec and is instruction-only, but it requires the user to install an external CLI ('npm install -g superbased') and optionally a desktop app from superbased.app. Reliance on installing an external npm package/desktop app is a moderate risk vector (code executed outside the bundle and network downloads). The bundle does not include or verify that CLI; installation and runtime behavior depend on that external software.

Credentials

Requires no environment variables or credentials in metadata, but the skill explicitly instructs operations that need access to local system state: microphone, clipboard images, window lists, filesystem PID files, and the ability to send captures for AI/OCR analysis. It also implies configuration of external providers inside the desktop app/MCP server (not declared). The absence of declared credentials or config paths is a mismatch with the level of local and network access this skill will exercise.

ℹ

Persistence & Privilege

always:false (normal). The install flow instructs the user to register an MCP server via 'openclaw mcp set', which modifies the agent's MCP registry (expected for MCP-based tools). The skill references a PID file (~/.superbased/) and a desktop app that auto-bridges — these create persistent integration points outside the bundle but the skill does not demand elevated system-wide privileges or 'always:true'.

What to consider before installing

This skill will give an agent the ability to see and act on your desktop (screenshots, clipboard, microphone, window trees) and recommends installing an external npm CLI and an optional desktop app that will run outside the skill. Before installing: (1) Verify the superbased CLI and desktop app authorship and review their source or npm page; (2) Confirm what network endpoints the desktop app / CLI contact and which providers handle AI/OCR/transcription (you may be uploading screen content and audio); (3) Consider the risk of automated CAPTCHA solving and humanization — these are explicitly aimed at evading bot-detection and can be abused; (4) Keep GUI automation disabled by default and require explicit confirmations for any state-modifying actions; (5) Test in a safe/sandbox environment first and audit ~/.superbased/ and OpenClaw MCP registration changes; (6) If you don't trust the external CLI/app or the vendor, do not install — or require strict local review and network restrictions before enabling the skill.

Like a lobster shell, security has layers — review code before you run it.

ai-visionvk97ff6jjagh0tam2y1ex7xrk1n85npyjcaptchavk97ff6jjagh0tam2y1ex7xrk1n85npyjgui-automationvk97ff6jjagh0tam2y1ex7xrk1n85npyjhumanizationvk97ff6jjagh0tam2y1ex7xrk1n85npyjlatestvk97ff6jjagh0tam2y1ex7xrk1n85npyjmcpvk97ff6jjagh0tam2y1ex7xrk1n85npyjocrvk97ff6jjagh0tam2y1ex7xrk1n85npyjscreenshotvk97ff6jjagh0tam2y1ex7xrk1n85npyj

34downloads

0stars

1versions

Updated 1d ago

v2.0.2

MIT-0

SuperBased gives OpenClaw agents both eyes (screen capture, AI vision, OCR) and hands (full GUI automation with humanization v2 + CAPTCHA-solving guidance) on the user's desktop. The actual capabilities are exposed through 72 MCP tools served by the SuperBased MCP server (superbased mcp); this skill bundle teaches the agent when to reach for which tool.

Two-step install (run once)

# 1. Install this skills bundle from ClawHub
openclaw skills install superbased

# 2. Register the SuperBased MCP server
openclaw mcp set superbased '{"command":"superbased","args":["mcp"]}'

# 3. (Pre-req) the SuperBased CLI on PATH
npm install -g superbased

Optional: install the SuperBased desktop app from superbased.app for a GUI to browse captures, configure providers, and manage the gallery. When the desktop app is running, superbased mcp auto-bridges to it via a PID file at ~/.superbased/, so OpenClaw and the desktop share state.

When to use SuperBased

Trigger SuperBased when the user's request involves any of:

Seeing what's on screen — "look at this", "what's on my screen", "describe what I'm seeing", "read this dialog"
Verifying a UI change — "did the button update?", "is the error gone?"
Reading content that's hidden behind scroll — "what are all the settings?", "walk me through the sidebar"
Visual regression testing — "record a baseline of the login flow", "did anything change visually?"
Watching for issues during long-running processes — "monitor my deploy for errors", "let me know if anything fails"
Extracting text from images / screen — "OCR this", "extract the text from this region"
Voice input — "transcribe what I'm about to say", "type via dictation"
Compressing large text into images — "send this 5K-token block as one image"
Annotating / redacting screenshots — "highlight the broken thing", "redact the API key before sharing"
Driving the desktop UI — "click that button", "type into the email field", "fill out this form", "press Cmd+S"
Multi-step workflow automation — "open File menu, pick Open, type the path, press Enter, screenshot the result"
Solving in-flow CAPTCHA challenges — "this drag puzzle is blocking me", "select all squares with traffic lights"
Fighting bot detection — when an automation flow on a hardened site needs cursor-trajectory humanization

Sub-skills (use these as the agent's working knowledge)

The 11 SKILL.md files in this bundle each cover one trigger category. Read the relevant one first when the user request matches its description:

File	Use when
skills/screenshot/SKILL.md	Capturing the screen at the right resolution / window / region
skills/visual-qa/SKILL.md	Record-baseline → make-changes → record-again → diff workflow
skills/monitor/SKILL.md	Proactive screen watching during deploys, tests, builds
skills/walkthrough/SKILL.md	Reading a scrollable section end-to-end via `superbased_scroll_capture`
skills/compress/SKILL.md	Converting large text to token-efficient images
skills/redact/SKILL.md	Auto-redacting secrets / PII before sharing
skills/dictation/SKILL.md	Voice input, audio transcription, speech-to-text
skills/annotate/SKILL.md	Highlighting areas, marking regressions, drawing on captures
skills/gui-automation/SKILL.md	Click / type / scroll / drag / form-fill / sequence — driving the desktop
skills/captcha-solving/SKILL.md	reCAPTCHA / Cloudflare Turnstile / drag puzzles / rotation puzzles / image grids
skills/humanization/SKILL.md	Picking the right `humanize` profile (off / light / human / paranoid) per call

The 72 MCP tools at a glance

Capture & View (5): superbased_screenshot, _capture_image, _capture, _gallery_image, _window_list

AI & OCR (8): superbased_ai, _ai_usage, _ocr, _transcribe, _compress_text, _project, _workspace_sync, _stt_status

Gallery (2): superbased_gallery, _gallery_update

Privacy & Annotations (2): superbased_redact, _annotate

Dictation & Voice (2): superbased_dictate, _dictation_history

Recording & Visual QA (7): superbased_recording, _sessions, _describe_frames, _narrate, _diff, _baseline, _export

Settings, Auth & System (6): superbased_settings, _presets, _auth, _license, _health, _clipboard

GUI Automation (40): superbased_ui_dump, _scroll_capture, _scroll_to, _sequence, _click, _type, _hotkey, _scroll, _drag, _drag_file, _hover, _context_menu_select, _form_fill, _dialog_handle, _open_url, _find_in_page, _tab_management, _tray_click, _virtual_desktop, _window_state, _resize_window, _focus_window, _window_bounds, _find_title_bar_drag_region, _display_list, _launch_app, _find_image, _capture_template, _pixel_color, _ax_invoke, _accessibility_tree, _locate, _wait, _wait_for, _mouse_position, _dry_run, _replay, _doctor_gui_automation, _undo_last, _tools

Safety rails (for the GUI automation surface)

Before any state-modifying GUI action (click, type, drag, sequence, form_fill, etc.):

The master toggle (Settings > GUI Automation > Enabled) must be on. Run superbased_doctor_gui_automation to verify.
Per-action toggles (click, type, hotkey, scroll, drag, hover) must each be enabled.
Every state-modifying call must pass confirm: true — the server refuses without it.
Protected-apps blocklist + NDJSON audit log are server-side; users can audit every action you took.

When to bump humanization

Default humanize: 'light' is enough for most consumer sites. Bump to 'human' for sites with active bot detection (Cloudflare-fronted, reCAPTCHA-gated). Bump to 'paranoid' for hardened targets (banking, ticketing, social media bot crackdowns). See skills/humanization/SKILL.md for the full picker.

Comments

Loading comments...