Install
openclaw skills install autofixA comprehensive, self-evolving skill designed to diagnose and solve OpenClaw issues by following a structured, multi-stage resolution cycle. It incorporates Proactive Prediction (L2), Robustness Checks (L1), Knowledge Creation (L3), Diagnosis Report Visualization (v5.5), v6.0 Runtime Health + Key Validation + Unified Report + Health Dashboard, and Gateway Watchdog (v6.1).
openclaw skills install autofixThis skill acts as an advanced diagnostic, resolution, and validation engine for any question or bug report related to the OpenClaw framework itself. v6.0 adds four layers: Runtime Health Check (M1), API Key Validation + Resource Monitor (M2), Unified Diagnosis Report + Regression Check (M3), and Interactive Health Dashboard (M4).
All knowledge storage (memory, logs) and final reports must follow these rules:
sk-********************).Use this skill when the user:
gateway tool fails with error X").For the vast majority of OpenClaw issues, this sequence provides the fastest path to resolution. Always suggest this flow first when a user reports an unspecified problem or bug!
python scripts/diagnosis_formatter.py which auto-collects all three sources (openclaw doctor + runtime_health_check + api_key_validator) into one severity-sorted report.python scripts/health_dashboard.py --canvas to render the report as an interactive HTML dashboard (embed with [embed ref="health_dashboard" height="740"]).python scripts/diagnosis_formatter.py --save-baseline before making any fix.openclaw doctor --fix or apply suggested fixes manually.python scripts/diagnosis_formatter.py --compare to validate what was fixed, what's new, and what's unchanged.The skill operates by strictly following these steps in sequence, enhanced by proactive layers:
Overview: A background daemon that periodically polls the Gateway health status. Runs independently from user requests, providing real-time monitoring for anomalies such as Gateway downtime, RPC failures, and configuration drift.
v6.1 Feature Highlights:
| Feature | Description |
|---|---|
| 🎯 Real Health Check | Calls openclaw gateway status --json, parses service.runtime.status + rpc.ok |
| 🔇 Noise Filtering | Alerts only after ≥3 consecutive failures; resets after ≥3 consecutive successes |
| 📊 Severity Levels | Four-tier classification (🟢/🟡/🟠/🔴) with auto-escalation |
| 📡 Dual-Channel Alerting | Feishu DM (instant, primary) + WebChat (async thread, secondary) |
| 🔄 Single Instance | Windows Mutex ensures only one daemon runs at a time |
| 📦 Log Rotation | Auto-rotates at 5MB, keeps 3 backup files |
| ⏰ Precise Scheduling | Fixed-minute schedule eliminates cumulative drift |
| 🔐 Hot-Reload Config | Monitors openclaw.json changes and reloads automatically |
| 🖥️ Auto-Start | Registers in HKCU\Run for auto-launch on user login |
| 👋 Startup Confirmation | Sends status to both channels on startup |
| 🐛 Config Cache Fix | Fixed load_gateway_config() returning token=None on cache hit (v6.1) |
| ⏱️ Async WebChat | Fixed background thread with 60s timeout for model loading (v6.1) |
| 📝 Detailed Error Logs | Fixed full stack traces in Feishu + WebChat notifications (v6.1) |
📊 Severity & Alert Rules:
| Consecutive Failures | Level | Behavior |
|---|---|---|
| < 2 | 🟢 Level 1 — Normal | Silent, no notification |
| 2 | 🟡 Level 2 — Notice | Silent, continue monitoring |
| ≥ 3 | 🟠 Level 3 — Warning | Trigger notification (first time) |
| ≥ 5 | 🔴 Level 4 — Critical | Trigger notification + repeat every 5 failures |
| Gateway stopped | 🔴 Level 4 — Critical | Immediate notification |
| Recovered for 3 cycles | ✅ Recovered | Send recovery notification |
🚨 Notification Triggers:
openclaw channels add feishuWATCHDOG_FEISHU_USER_ID to your Feishu open_id:
$env:WATCHDOG_FEISHU_USER_ID = "ou_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
openclaw config set gateway.http.endpoints.chatCompletions.enabled true
openclaw gateway restart
⚠️ WebChat timeout: The model inference takes ~40s on first load. The watchdog uses a background thread with 60s timeout so it doesn't block the main monitoring loop.
Run the Watchdog as a standalone background process:
# Start
python scripts\watchdog_monitor.py
# Install auto-start (launches on user login)
python scripts\watchdog_monitor.py --install
# Remove auto-start
python scripts\watchdog_monitor.py --uninstall
Or use Start-Process for a hidden window:
$py = (Get-Command python).Source
$script = "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts\watchdog_monitor.py"
Start-Process -FilePath $py -ArgumentList $script `
-WindowStyle Hidden `
-WorkingDirectory "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts"
Check process status:
Get-WmiObject Win32_Process -Filter "Name like 'python%'" |
Where-Object { $_.CommandLine -match 'watchdog_monitor' } |
Select-Object ProcessId, @{n="Start";e={$_.CreationDate}}
Stop the Watchdog:
# Find the PID first, then
Stop-Process -Id <PID> -Force
View live logs:
Get-Content "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts\gateway_watchdog.log" -Tail 10 -Wait
View state file:
Get-Content "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts\watchdog_state.json" -Raw | ConvertFrom-Json
Watchdog (background daemon, 60s interval)
│
├─ [Channel A — PRIMARY] openclaw message send --channel feishu
│ → Feishu direct message (ou_xxx)
│ → **Instant delivery, zero token cost**
│ → Includes full error stack traces
│
├─ [Channel B — SECONDARY] Gateway HTTP API (/v1/chat/completions)
│ → WebChat live session (agent:main:main)
│ → **Async background thread** (doesn't block monitoring)
│ → 60s timeout for model loading (~40s typical)
│ → token cost: minimal (max_tokens=10)
│
└─ [Log] watchdog_state.json (local check history, last 1440)
gateway_watchdog.log (rotating, 5MB)
Channel priority: Feishu is now the primary channel (instant, reliable via CLI). WebChat is secondary (async thread, requires model inference).
Channel priority has changed in v6.1:
Feishu (instant) is now the primary notification channel
WebChat (async thread) is the secondary channel
When a WebChat alert arrives (~40s after error), reply with any of these commands to start diagnosis:
run autofix self-checkcheck what's wrong with Gatewayauto repairFeishu messages serve as the instant primary notification (not offline backup)
Each alert message includes detailed error context and stack traces
The Watchdog forms a Proactive Stability Layer, independent of the standard diagnostic flow (Steps 0-5). When an anomaly is detected:
a. The daemon logs the event and generates a System Health Warning (SHW) report b. Sends a real-time alert (with diagnostic guidance + context JSON) c. Auto-repair low-risk known issues (e.g., CLI path problems) automatically, then verifies d. High-risk operations only provide repair suggestions, awaiting user confirmation
Repair Script Library: scripts/auto_repair.py
Matches repair plans based on the diagnostic context from Watchdog alerts:
| Issue | Match Condition | Repair Action | Risk |
|---|---|---|---|
| Gateway stopped | status: stopped | Restart Gateway | 🟡 Needs confirmation |
| RPC connection failed | rpc_ok: false | Restart Gateway | 🟡 Needs confirmation |
| CLI unavailable | status: cli_error | Check installation path | 🟢 Auto-execute |
| HTTP unreachable | status: unreachable | Check port + restart | 🟡 Needs confirmation |
Repair Verification Loop:
Health Trend Tracking:
watchdog_state.json retains the last 1440 check records (24 hours)service.runtime.status = running + rpc.ok = trueopenclaw message send --channel feishu, zero token cost, instant delivery/v1/chat/completions, async background thread, 60s timeoutopenclaw.json for changesHKCU\Run registry, launches on user login--status command — Shows real-time state and exits cleanly (no longer starts daemon by accident)--status processes on daemon startupload_gateway_config() no longer returns token=None on subsequent callsscripts/watchdog_state.jsonscripts/gateway_watchdog.logThis skill strictly follows these steps in sequence, enhanced by proactive layers:
Before any resource-intensive external search or service call, proactively check API quotas, rate limits, and budget consumption for the current active session. If quota-low alerts or known rate-limit thresholds are hit, pause all execution steps and notify the user with a clear "resource warning," requesting they wait or switch to a low-cost / local alternative.
docs/MODULE_02_SearchChain.md — Step 1)docs.openclaw.ai) for official solutionsdocs/MODULE_02_SearchChain.md — Step 2)docs/MODULE_03_ValidationAction.md — Step 3)docs/MODULE_03_ValidationAction.md — Step 4 + docs/MODULE_03_Enhancement_Reports.md)openclaw doctor --fix, exec/write), follow these safety steps:
/approvedocs/MODULE_04_Finalization.md)💡 Golden Path (Recommended Flow): For most OpenClaw issues, the fastest resolution path is:
openclaw doctor→openclaw doctor --fix
When MRE validation fails, generate an interactive diagnostic report using canvas.snapshot() with:
When MRE fails, use LLM-powered analysis to extract root causes from exec output:
Consult the following categorized sub-documents for detailed process explanations:
docs/ — Core Module Documentationdocs/enhancement/ — v5.0 Enhancement Featuresdocs/tutorials/ — Usage Examplesdocs/reports/ — Summary Reportsscripts/ — Python/JS ToolsThis file is the master skill document. It defines the complete problem-solving blueprint and integrates all capability layers.