Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Virtual Desktop Pro v4 -- Universal Browser Execution

Persistent authenticated browser for OpenClaw via kasmweb/chrome Docker sidecar. Principal logs in once via noVNC — sessions saved permanently in Docker volu...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 40 · 0 current installs · 0 all-time installs
byWesley Armando@georges91560
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (persistent Chrome sidecar + authenticated sessions) align with required binaries (docker, python3), required env vars (VNC_PW, BROWSER_CDP_URL) and the included browser_control.py. External services listed (CapSolver, Browserbase, Anthropic) are optional and match features (CAPTCHA solving, residential proxy, Claude Vision).
Instruction Scope
SKILL.md instructs the agent to edit docker-compose.yml, create a persistent Docker volume, open port 6901 (or use SSH tunnel), and write logs/screenshots and learning files into /workspace. It also uses Telegram notifications for CAPTCHA/manual actions. These are all within the scope of running a persistent browser, but they do grant broad access to any sites you log into and may send screenshots/notifications externally (Telegram). The skill also reads some workspace files for context (/workspace/TOOLS.md, .learnings/*) — this is reasonable but worth noting if those files contain sensitive data.
Install Mechanism
There is no formal install spec, so installation is instruction-driven (pull kasmweb/chrome via Docker, install Playwright/requests in the container). Pulling a ~2GB Docker image is expected. The repo doesn't contain opaque external download URLs; the main runtime download is the official kasmweb/chrome image and Python packages. Confirm you are comfortable pulling that image and the network access required to fetch Playwright runtimes.
Credentials
Required env vars (VNC_PW, BROWSER_CDP_URL) are proportional to a noVNC/CDP browser sidecar. Optional keys (CAPSOLVER_API_KEY, BROWSERBASE_API_KEY, ANTHROPIC_API_KEY, TELEGRAM_BOT_TOKEN) are justified by their named features. Caveats: providing these keys gives the skill ability to send data to third-party services (CAPSolver, Browserbase, Anthropic) and will incur costs; Telegram notifications may include screenshots or session status and could leak sensitive content if the Telegram channel is not private.
Persistence & Privilege
The skill requests persistent sessions (Docker volume 'browser-profile') and writes logs/screenshots into /workspace, which is consistent with its purpose. always:false and no modification of other skills' configs are used. Autonomous invocation is allowed by default but not uncommon; combine with the above (persistent sessions + optional external keys) only if you trust the runtime and keys.
Assessment
This skill appears to do what it claims, but it grants the agent broad access to any sites you log into via the persistent browser and uses optional third‑party services that can receive data (screenshots, pages) and incur costs. Before installing: 1) Back up your docker-compose.yml; review the one‑liner change before applying. 2) Run the browser sidecar on an isolated VPS or test environment first. 3) Restrict access to port 6901 (prefer SSH tunnel or firewall rule to your IP). 4) Use a strong VNC_PW and rotate it if you stop using the skill. 5) Only set CAPSOLVER/BROWSERBASE/ANTHROPIC/TELEGRAM keys if you trust those services; expect CAPTCHA screenshots and page content to be transmitted. 6) Inspect the kasmweb/chrome image version and consider pinning/updating it from an official source. 7) Monitor /workspace/logs and audit the AUDIT.md and screenshots for unexpected activity. If you want higher assurance, ask the publisher for provenance of the repo and a signed release or run the container with network egress rules to limit external destinations.

Like a lobster shell, security has layers — review code before you run it.

Current versionv4.0.1
Download zip
automation browser computer-use google-workspace authenticated-browser novnc captcha-solver claude-vision playwright cdp docker-sidecar kasmweb residential-proxy workflows self-improvingvk97drhyfmp9yefgh7fcjcypyws82yfvtlatestvk972dsa717e2hd735362y01phs82yh3e

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🖥️ Clawdis
Binsdocker, python3
EnvVNC_PW, BROWSER_CDP_URL

SKILL.md

Virtual Desktop — Authenticated Browser Layer

What this skill does

Gives the agent a persistent authenticated browser (kasmweb/chrome) running as a Docker sidecar. Principal logs in once via noVNC. Sessions saved permanently.

CapabilityWhat it means
ANALYZERead any page, extract structured data, monitor changes over time
PLANMap the UI, identify selectors, prepare multi-step action sequences
EXECUTEClick, type, fill forms, submit, upload, download, navigate any flow
SELF-CORRECTScreenshot error state, identify root cause, retry with alternate approach
IMPROVEWrite UI patterns and selector maps to .learnings/ after every session

Use cases: Google Workspace · social platforms · admin dashboards · e-commerce · forms · market research · data extraction · any platform with or without an API


Workspace Structure

/workspace/
├── screenshots/          ← visual proof of every action (auto-created)
├── logs/browser/         ← full tracebacks (auto-created)
├── tasks/lessons.md      ← immediate task capture during mission
├── AUDIT.md              ← append-only action log
├── memory/YYYY-MM-DD.md  ← daily session summary
└── .learnings/
    ├── ERRORS.md         ← errors, broken selectors, ref maps
    └── LEARNINGS.md      ← patterns, timing, navigation per platform

When to Use

Use this skill when the task requires a real authenticated browser:

  • Pages requiring login (Google, social networks, dashboards, admin panels)
  • JS-rendered pages where static fetch returns nothing useful
  • Multi-step flows: forms, checkouts, confirmations, file uploads
  • Platforms without an API
  • Screenshots or visual evidence of a page state
  • CAPTCHA-protected pages

Prefer a lighter path first — if a simple HTTP request or existing OpenClaw tool can answer the question, use that instead. This skill uses more tokens and resources than plain fetch.


Architecture

This skill runs a persistent kasmweb/chrome Docker sidecar alongside OpenClaw. Principal logs in once via noVNC (port 6901). Sessions saved permanently in a Docker volume.

Three execution paths — load only what the task needs:

PathWhen to useFile
OpenClaw native browserSimple navigate/click/extract — fastest, fewest tokensBuilt-in
browser_control.pyAUDIT logging, workflows, CAPTCHA, Visionbrowser_control.py
noVNC (manual)Initial login, 2FA, session renewalPort 6901

Load only the smallest path needed. Simple navigation → OpenClaw native. Complex multi-step with logging → browser_control.py.


Setup — Run Once

OPENCLAW_DIR="${OPENCLAW_DIR:-$(pwd)}"
cd "$OPENCLAW_DIR"
CONTAINER="${OPENCLAW_CONTAINER:-$(docker ps --format '{{.Names}}' | grep openclaw | head -1)}"

# 1. Add kasmweb/chrome to docker-compose.yml
python3 -c "
import yaml, os
VNC_PW = os.environ.get('VNC_PW') or __import__('secrets').token_urlsafe(18)
with open('docker-compose.yml') as f:
    data = yaml.safe_load(f)
data.setdefault('services', {})['browser'] = {
    'image': 'kasmweb/chrome:1.15.0',
    'container_name': 'browser',
    'restart': 'unless-stopped',
    'shm_size': '1gb',
    'ports': ['6901:6901', '9222:9222'],
    'environment': [
        'VNC_PW=' + VNC_PW,
        'RESOLUTION=1920x1080',
        'CHROME_ARGS=--remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 --no-sandbox --disable-blink-features=AutomationControlled --disable-infobars'
    ],
    'volumes': ['browser-profile:/home/kasm-user/chrome-profile'],
    'networks': list(data.get('networks', {'default': None}).keys())
}
data.setdefault('volumes', {})['browser-profile'] = None
with open('docker-compose.yml', 'w') as f:
    yaml.dump(data, f, default_flow_style=False, allow_unicode=True)
print('docker-compose.yml updated')
"

# 2. Update .env
# VNC_PW — generate a strong random password if not already set
if ! grep -q "VNC_PW" .env 2>/dev/null; then
  VNC_GENERATED=$(python3 -c "import secrets,string;     print(''.join(secrets.choice(string.ascii_letters+string.digits) for _ in range(24)))")
  echo "VNC_PW=${VNC_GENERATED}" >> .env
  echo "✅ VNC_PW generated — save this: ${VNC_GENERATED}"
fi
grep -q "BROWSER_CDP_URL"     .env || echo "BROWSER_CDP_URL=http://browser:9222" >> .env
grep -q "CAPSOLVER_API_KEY"   .env || echo "CAPSOLVER_API_KEY="                  >> .env
grep -q "BROWSERBASE_API_KEY" .env || echo "BROWSERBASE_API_KEY="                >> .env

# 3. Update openclaw.json — hot reload, no restart needed
python3 -c "
import json, os
f = 'data/.openclaw/openclaw.json'
with open(f) as fp: cfg = json.load(fp)
cfg.setdefault('browser', {}).update({'enabled': True, 'headless': False,
    'noSandbox': True, 'defaultProfile': 'chrome-sidecar'})
profiles = cfg['browser'].setdefault('profiles', {})
profiles['chrome-sidecar'] = {'cdpUrl': 'http://browser:9222', 'color': '#4285F4'}
bb_key = os.environ.get('BROWSERBASE_API_KEY', '')
if bb_key:
    profiles['browserbase'] = {'cdpUrl': f'wss://connect.browserbase.com?apiKey={bb_key}', 'color': '#F97316'}
with open(f, 'w') as fp: json.dump(cfg, fp, indent=2)
print('openclaw.json updated — hot reload active')
"

# 4. Start browser container only — OpenClaw keeps running
docker compose up -d --no-deps browser
sleep 12

# 5. Install Python dependencies
docker exec "$CONTAINER" pip install requests playwright --break-system-packages -q
docker exec "$CONTAINER" node /app/node_modules/playwright-core/cli.js install chromium
echo "✅ Python dependencies installed"

# 6. Download CapSolver extension (optional — only if key present)
CAPSOLVER_KEY=$(grep CAPSOLVER_API_KEY .env | cut -d= -f2)
if [ -n "$CAPSOLVER_KEY" ]; then
  docker exec "$CONTAINER" bash -c "
  apt-get install -y unzip curl -qq
  curl -sL https://github.com/capsolver/capsolver-browser-extension/releases/latest/download/chrome.zip \
    -o /tmp/capsolver.zip
  unzip -q /tmp/capsolver.zip -d /data/.openclaw/capsolver-extension
  sed -i \"s/apiKey: \\\"\\\"/apiKey: \\\"$CAPSOLVER_KEY\\\"/\" \
    /data/.openclaw/capsolver-extension/assets/config.js 2>/dev/null
  "
  echo "✅ CapSolver extension configured"
fi

# 7. Create workspace directories and deploy browser_control.py
docker exec "$CONTAINER" bash -c "
mkdir -p /data/.openclaw/workspace/skills/virtual-desktop
mkdir -p /workspace/screenshots /workspace/logs/browser /workspace/.learnings /workspace/memory
touch /workspace/AUDIT.md /workspace/.learnings/ERRORS.md /workspace/.learnings/LEARNINGS.md
"
docker cp {baseDir}/browser_control.py \
  "$CONTAINER":/data/.openclaw/workspace/skills/virtual-desktop/browser_control.py
echo "✅ browser_control.py deployed"

# 8. Verify
docker ps | grep -E "openclaw|browser"
curl -s http://localhost:9222/json > /dev/null && echo "✅ Chrome CDP active" || echo "⏳ Chrome starting"
docker exec "$CONTAINER" \
  python3 /data/.openclaw/workspace/skills/virtual-desktop/browser_control.py status

# 9. Notify principal
VPS_IP=$(curl -s ifconfig.me 2>/dev/null || echo "YOUR_VPS_IP")
echo "Virtual Desktop ready — https://${VPS_IP}:6901"
echo "Log in to your platforms via noVNC then reply DONE."

Initial Login — Once Per Platform

https://YOUR_VPS_IP:6901   login: kasm_user   password: your VNC_PW

Open Chrome via noVNC and log in to every platform you want the agent to access. Sessions saved in Docker volume browser-profile — survive restarts — valid indefinitely.

Step by step — do this once after setup:

1. Open https://YOUR_VPS_IP:6901 in your browser
2. Enter password: your VNC_PW value from .env
3. Chrome Desktop opens inside the browser

4. Log in to Google (accounts.google.com)
   → Email + password + 2FA if required
   → "Trust this device" → YES
   → This unlocks: Gmail, Drive, Calendar, Docs,
     Sheets, Google AI Studio, YouTube, all Google services

5. Log in to every other platform you want Wesley to access:
   → Twitter/X        → twitter.com
   → LinkedIn         → linkedin.com
   → Reddit           → reddit.com
   → Hostinger panel  → hpanel.hostinger.com
   → Any other site   → log in normally

6. After each login: Chrome saves the session automatically
   in the Docker volume browser-profile

7. Reply DONE to Wesley on Telegram
   → Wesley confirms sessions are active
   → He will never ask for your credentials again

What happens after:

Wesley opens any platform → already logged in ✅
No credentials needed → ever again
Session expires (rare) → Wesley notifies Telegram
  → You open noVNC → log in again → reply DONE
  → Takes 2 minutes

Important — 2FA:

Google 2FA → confirm once via noVNC
              Chrome remembers the device
              No 2FA required again on this browser

Other platforms → same principle
                  confirm once → trusted device → done

Quick Reference

ReferenceContent
OpenClaw native browser commandsSee below — openclaw browser
browser_control.py commandsSee below — $BC
CAPTCHA strategySee CAPTCHA section
Residential proxySee Proxy section
Claude VisionSee Vision section
Selectors, timing, auth flowsLEARNINGS.md (auto-built by agent)
Broken selectors, error recoveryERRORS.md (auto-built by agent)

OpenClaw Native Browser — Fastest Path

# Navigation
openclaw browser open <url>
openclaw browser snapshot [--interactive]
openclaw browser back | forward | reload | close

# Interaction
openclaw browser click <ref>
openclaw browser type <ref> "text"
openclaw browser select <ref> "value"
openclaw browser hover <ref>
openclaw browser scroll [--direction down|up|right|left]

# Files
openclaw browser upload /tmp/file.pdf
openclaw browser download <ref> file.pdf

# Cookies & storage
openclaw browser cookies | cookies set k v --url "https://example.com" | cookies clear
openclaw browser storage local get | set k v | clear

# Configuration
openclaw browser set geo 48.8566 2.3522 --origin "https://example.com"
openclaw browser set timezone Europe/Paris
openclaw browser set locale fr-FR
openclaw browser set device "iPhone 14"
openclaw browser set media dark
openclaw browser set headers --headers-json '{"X-Custom":"val"}'

# Debug
openclaw browser console --level error
openclaw browser requests --filter api
openclaw browser trace start | stop
openclaw browser status

# Stealth (if site blocks VPS)
openclaw browser --browser-profile browserbase open <url>

browser_control.py — With AUDIT Logging + CAPTCHA + Vision

BC="python3 /data/.openclaw/workspace/skills/virtual-desktop/browser_control.py"

$BC screenshot  <url> [label]
$BC navigate    <url> [selector]
$BC click       <url> <selector>
$BC click_xy    <url> <x> <y>
$BC fill        <url> <selector> <value>
$BC select      <url> <selector> <value>
$BC hover       <url> <selector>
$BC scroll      <url> <direction> [pixels]
$BC keyboard    <url> <selector> <key>
$BC extract     <url> <selector> [output_file]
$BC wait_for    <url> <selector> [timeout_ms]
$BC upload      <url> <file_selector> <file_path>
$BC analyze     <url_or_image> [question]     ← Claude Vision
$BC captcha     <url>                         ← Autonomous CAPTCHA
$BC workflow    <json_steps_file>             ← Multi-step workflow
$BC status

Workflow JSON Format

[
  { "action": "goto",       "target": "https://TARGET_URL" },
  { "action": "captcha" },
  { "action": "analyze",    "value": "Identify the key elements on this page" },
  { "action": "wait_for",   "target": ".loaded", "timeout_ms": 5000 },
  { "action": "fill",       "target": "#field",  "value": "text" },
  { "action": "click",      "target": "#btn" },
  { "action": "click_xy",   "x": 960, "y": 540 },
  { "action": "scroll",     "direction": "down" },
  { "action": "hover",      "target": "#menu" },
  { "action": "select",     "target": "#list",   "value": "option" },
  { "action": "keyboard",   "target": "#input",  "value": "Enter" },
  { "action": "extract",    "target": ".data",   "value": "/workspace/tasks/out.json" },
  { "action": "screenshot" },
  { "action": "wait",       "value": "2" }
]

CAPTCHA — Autonomous Strategy

1. Auto-detection on every page load
   → reCAPTCHA v2/v3, hCaptcha, Cloudflare Turnstile

2. CapSolver API (if CAPSOLVER_API_KEY set)
   → Extracts sitekey → API → token → injects → continues
   → ~$0.001 per CAPTCHA

3. Cloudflare Turnstile
   → CapSolver Chrome extension handles in background → waits 60s

4. Fallback — if CapSolver fails or key not set
   → Screenshot → Telegram → principal opens noVNC → solves → agent continues

Proxy — If Site Blocks the VPS

# Browserbase — CAPTCHA + stealth + residential proxy built-in
# Free tier: 1 concurrent session, 1h/month — browserbase.com
# Add BROWSERBASE_API_KEY to .env
openclaw browser --browser-profile browserbase open <url>

# Custom proxy
# Add PROXY_URL=http://user:pass@proxy:port to .env
# browser_control.py reads it automatically via get_browser()

Claude Vision — Analyse Images and Pages

# Web page → auto screenshot + analysis
$BC analyze https://example.com "What does this page sell?"

# AI-generated image
$BC analyze https://site.com/image.png "Describe the visual elements"

# Existing screenshot
$BC analyze /workspace/screenshots/capture.png "Is there a form here?"

# Inside a workflow
{ "action": "analyze", "value": "Identify all form fields" }

Execution Protocol

BEFORE EVERY ACTION:
  1. Log to AUDIT.md: "BEFORE [action] on [url]"
  2. Detect CAPTCHA → resolve automatically if present
  3. Execute action
  4. Screenshot as proof
  5. Log to AUDIT.md: "OK/FAILED [action]"
  6. Telegram report if real-world consequences

NEVER:
  → Access platforms not authorized by the principal
  → Execute payments or destructive actions without explicit approval
  → Fail silently — always log
  → Retry more than 3 times without alerting the principal

Browser Traps

Avoid these common mistakes:

  • Guessing selectors from source → use snapshot --interactive or codegen to discover stable refs
  • Using force: true before understanding why → investigate the overlay/disabled state first
  • Driving a full browser when HTTP would work → more cost, more flake, less signal
  • Sharing one session across parallel tasks that mutate state → failures become order-dependent
  • Waiting on networkidle for chatty SPAs → analytics, polling, or sockets keep the page "busy" even when the UI is ready
  • Retrying the same selector 10 times → log to ERRORS.md and alert the principal instead
  • Accessing high-stakes flows (payments, production data) without explicit confirmation → require approval first

Error Recovery

CAPTCHA          → CapSolver auto → fallback noVNC
CLOUDFLARE       → switch to --browser-profile browserbase
SESSION EXPIRED  → Telegram → principal opens noVNC → reconnects
ELEMENT MISSING  → use analyze to understand the new layout
                 → log to .learnings/ERRORS.md with ref map
TIMEOUT          → check /workspace/logs/browser/YYYY-MM-DD.log

Files Written

FileWhenContent
/workspace/AUDIT.mdEvery actionBefore + after log, append-only
/workspace/screenshots/YYYY-MM-DD_*.pngEvery actionVisual proof
/workspace/screenshots/*_analysis.txtAfter analyzeVision result
/workspace/logs/browser/YYYY-MM-DD.logOn exceptionFull traceback
/workspace/.learnings/ERRORS.mdOn failureErrors + broken selectors
/workspace/.learnings/LEARNINGS.mdOn discoveryPatterns + timing per platform
/workspace/tasks/lessons.mdDuring missionImmediate task capture
/workspace/memory/YYYY-MM-DD.mdDailySession summary

This skill does NOT:

  • Create files outside the paths listed above
  • Persist sessions or credentials beyond the Docker volume
  • Make undeclared network requests beyond the target sites and optional services above
  • Access platforms not explicitly authorized by the principal

Self-Improvement

Write immediately after every session — do not batch:

# ERRORS.md — on failure
## [YYYY-MM-DD] [Platform] — [Title]
**Priority**: low|medium|high   **Status**: pending|resolved
**What happened**: ...   **Root cause**: ...   **Fix**: ...   **Ref map**: {"old_ref":"new_ref"}

# LEARNINGS.md — on discovery
## [YYYY-MM-DD] [Platform] — [Pattern]
**Category**: navigation|interaction|timing|auth_flow|captcha|vision
**Discovery**: ...   **Usage**: ...

Security

This skill opens port 6901 (noVNC) and stores authenticated browser sessions permanently.

REQUIRED before running:
  1. Set a strong VNC_PW in .env — never use the default
  2. Firewall port 6901 to your IP only:
     Hostinger → Panel → VPS → Firewall → restrict 6901 to your IP
     Or use SSH tunnel: ssh -L 6901:localhost:6901 user@YOUR_VPS_IP
  3. Only log in to accounts you trust the agent to access
  4. Optional keys (CapSolver, Browserbase, Anthropic) send data to
     those services — only add them if you trust and accept their costs

External Endpoints

EndpointData sentPurpose
Any URL the principal authorizesBrowser requests, cookies, form dataAutomation
http://browser:9222CDP protocol — internal onlyBrowser control
https://api.capsolver.comCAPTCHA sitekey + page URLCAPTCHA solving (optional)
wss://connect.browserbase.comBrowser sessionStealth proxy (optional)
https://api.anthropic.comScreenshot base64Claude Vision (optional)
https://registry.npmjs.orgPackage metadataPlaywright install only

No other data is sent externally.

Files

4 total
Select a file
Select a file to preview.

Comments

Loading comments…