OpenClaw Problem Solver自动修复小龙虾

Automation

A comprehensive, self-evolving skill designed to diagnose and solve OpenClaw issues by following a structured, multi-stage resolution cycle. It incorporates Proactive Prediction (L2), Robustness Checks (L1), Knowledge Creation (L3), Diagnosis Report Visualization (v5.5), and Gateway Watchdog (v4.1).

Install

openclaw skills install autofix

🧠 OpenClaw Problem Solver (v5.5) — Master Document

This skill acts as an advanced diagnostic, resolution, and validation engine for any question or bug report related to the OpenClaw framework itself. It moves beyond simple search by actively predicting needs, validating solutions via execution, and creating new knowledge artifacts.

🔒 Privacy-First Constraint

All knowledge storage (memory, logs) and final reports must follow these rules:

API Keys: Never store API keys in plain text. Use placeholders or hashes (e.g., sk-********************).
Private Details: Redact sensitive project names, internal credentials, and user identity info unless explicitly authorized.

🎯 When to Use This Skill

Use this skill when the user:

Asks "Why is [feature] not working in OpenClaw?"
Reports a specific bug (e.g., "The gateway tool fails with error X").
Needs guidance on how to implement a specific feature using OpenClaw's architecture or tools.
Wants to know the best practice for a certain task within the OpenClaw ecosystem, but requires verification.

🌟 Recommended Quick Fix Flow (The Golden Path)

For the vast majority of OpenClaw issues, this sequence provides the fastest path to resolution. Always suggest this flow first when a user reports an unspecified problem or bug!

Diagnosis: Instruct the user to run openclaw doctor in their terminal. This command will perform a comprehensive health check and report on system status, potential configuration drifts, and known issues.
Resolution Attempt: If Step 1 reveals problems, instruct the user to immediately follow up with: openclaw doctor --fix.

🚀 The Evolved Workflow (6-Step Cycle + Proactive Layers)

The skill operates by strictly following these steps in sequence, enhanced by proactive layers:

🤖 Gateway Watchdog (v4.1) — Proactive Stability Layer

Overview: A background daemon that periodically polls the Gateway health status. Runs independently from user requests, providing real-time monitoring for anomalies such as Gateway downtime, RPC failures, and configuration drift.

v4.1 Feature Highlights:

Feature	Description
🎯 Real Health Check	Calls `openclaw gateway status --json`, parses `service.runtime.status` + `rpc.ok`
🔇 Noise Filtering	Alerts only after ≥3 consecutive failures; resets after ≥3 consecutive successes
📊 Severity Levels	Four-tier classification (🟢/🟡/🟠/🔴) with auto-escalation
📡 Dual-Channel Alerting	WebChat (HTTP API) + Feishu (CLI) redundant delivery
🔄 Single Instance	Windows Mutex ensures only one daemon runs at a time
📦 Log Rotation	Auto-rotates at 5MB, keeps 3 backup files
⏰ Precise Scheduling	Fixed-minute schedule eliminates cumulative drift
🔐 Hot-Reload Config	Monitors `openclaw.json` changes and reloads automatically
🖥️ Auto-Start	Registers in `HKCU\Run` for auto-launch on user login
👋 Startup Confirmation	Sends a status message to both channels on startup

📊 Severity & Alert Rules:

Consecutive Failures	Level	Behavior
< 2	🟢 Level 1 — Normal	Silent, no notification
2	🟡 Level 2 — Notice	Silent, continue monitoring
≥ 3	🟠 Level 3 — Warning	Trigger notification (first time)
≥ 5	🔴 Level 4 — Critical	Trigger notification + repeat every 5 failures
Gateway stopped	🔴 Level 4 — Critical	Immediate notification
Recovered for 3 cycles	✅ Recovered	Send recovery notification

🚨 Notification Triggers:

Severity escalation (e.g., 1→3): sends alert
First time hitting alert threshold (≥3 consecutive failures): sends alert
At critical level, every 5 failures: sends reminder
System recovery (abnormal→normal for 3 consecutive checks): sends recovery notice

Prerequisites: Enable Gateway HTTP API

The Watchdog uses the Gateway HTTP API to deliver real-time notifications. Enable it first:

openclaw config set gateway.http.endpoints.chatCompletions.enabled true
openclaw gateway restart

Deployment (v4.1)

Run the Watchdog as a standalone background process:

# Start
python scripts\watchdog_monitor.py

# Install auto-start (launches on user login)
python scripts\watchdog_monitor.py --install

# Remove auto-start
python scripts\watchdog_monitor.py --uninstall

Or use Start-Process for a hidden window:

$py = (Get-Command python).Source
$script = "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts\watchdog_monitor.py"
Start-Process -FilePath $py -ArgumentList $script `
    -WindowStyle Hidden `
    -WorkingDirectory "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts"

Check process status:

Get-WmiObject Win32_Process -Filter "Name like 'python%'" |
    Where-Object { $_.CommandLine -match 'watchdog_monitor' } |
    Select-Object ProcessId, @{n="Start";e={$_.CreationDate}}

Stop the Watchdog:

# Find the PID first, then
Stop-Process -Id <PID> -Force

View live logs:

Get-Content "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts\gateway_watchdog.log" -Tail 10 -Wait

View state file:

Get-Content "$env:USERPROFILE\.openclaw\workspace\skills\autofix\scripts\watchdog_state.json" -Raw | ConvertFrom-Json

🔗 Notification Architecture

Watchdog (background daemon, 60s interval)
    │
    ├─ [Channel A] Gateway HTTP API (/v1/chat/completions)
    │            → WebChat live session (agent:main:main)
    │            → token cost: minimal (max_tokens=10)
    │
    ├─ [Channel B] openclaw message send --channel feishu
    │            → Feishu direct message (ou_xxx)
    │            → plain text, zero token cost
    │
    └─ [Log]     watchdog_state.json (state file)

Dual-Channel Alerting + Autofix Triggers

When a WebChat alert arrives, reply with any of these commands to start diagnosis:
- run autofix self-check
- check what's wrong with Gateway
- auto repair
Feishu messages serve as an offline backup
Each alert message includes a structured JSON diagnostic context at the end, so the agent can start fixing immediately

Workflow Integration + Autofix Linkage

The Watchdog forms a Proactive Stability Layer, independent of the standard diagnostic flow (Steps 0-5). When an anomaly is detected:

a. The daemon logs the event and generates a System Health Warning (SHW) report b. Sends a real-time alert (with diagnostic guidance + context JSON) c. Auto-repair low-risk known issues (e.g., CLI path problems) automatically, then verifies d. High-risk operations only provide repair suggestions, awaiting user confirmation

🛠️ Auto-Repair Module (v1.0)

Repair Script Library: scripts/auto_repair.py

Matches repair plans based on the diagnostic context from Watchdog alerts:

Issue	Match Condition	Repair Action	Risk
Gateway stopped	`status: stopped`	Restart Gateway	🟡 Needs confirmation
RPC connection failed	`rpc_ok: false`	Restart Gateway	🟡 Needs confirmation
CLI unavailable	`status: cli_error`	Check installation path	🟢 Auto-execute
HTTP unreachable	`status: unreachable`	Check port + restart	🟡 Needs confirmation

Repair Verification Loop:

After auto-repair, wait 3 seconds then re-run health check
Verification passed → send "Auto-repair succeeded" confirmation
Verification failed → send "Still needs manual diagnosis" escalation

Health Trend Tracking:

watchdog_state.json retains the last 1440 check records (24 hours)
Each record includes: timestamp, health status, severity level, source
Trend data can be visualized via Canvas health dashboard

Current Status (v4.1)

✅ Real health check — service.runtime.status = running + rpc.ok = true
✅ Noise filtering — ≥3 failures to trigger, ≥3 successes to reset
✅ Severity levels — Four-tier (🟢/🟡/🟠/🔴) with auto-escalation
✅ WebChat channel — Gateway HTTP API /v1/chat/completions, low token cost
✅ Feishu channel — openclaw message send --channel feishu, zero token cost
✅ Single instance — Windows Mutex prevents duplicates
✅ Log rotation — 5MB auto-rotate, 3 backups
✅ Precise scheduling — Fixed-minute schedule, no drift
✅ Hot-reload config — Watches openclaw.json for changes
✅ Auto-start — HKCU\Run registry, launches on user login
✅ Startup confirmation — Sends status to both channels on start
📁 State file: scripts/watchdog_state.json
📁 Log file: scripts/gateway_watchdog.log

Standard Workflow (6-Step Cycle + Proactive Layers)

This skill strictly follows these steps in sequence, enhanced by proactive layers:

Step 0: Resource Pre-check & Cost Management (New) — Starting Point

Before any resource-intensive external search or service call, proactively check API quotas, rate limits, and budget consumption for the current active session. If quota-low alerts or known rate-limit thresholds are hit, pause all execution steps and notify the user with a clear "resource warning," requesting they wait or switch to a low-cost / local alternative.

Step 1: Primary Search (See `docs/MODULE_02_SearchChain.md` — Step 1)

Search the official documentation (docs.openclaw.ai) for official solutions
Gather context information related to the problem
Extract key error messages and configuration status

Step 2: Backup Search (See `docs/MODULE_02_SearchChain.md` — Step 2)

If the official docs don't provide an answer, search GitHub Issues
Look for community-reported problems and solutions
Collect code verification requirements or pattern-matching information

Step 3: Analysis & Decision (See `docs/MODULE_03_ValidationAction.md` — Step 3)

Choose the best action path based on search results
Perform evidence chain analysis (L1) to evaluate solution reliability
Decide between direct answer, code verification, or contextual inquiry

Step 4: Validation & Action (v5.0 Enhanced) (See `docs/MODULE_03_ValidationAction.md` — Step 4 + `docs/MODULE_03_Enhancement_Reports.md`)

Execute validation (MRE) or propose a contextual inquiry
Generate an interactive diagnosis report (if MRE fails)
✅ Three-step confirmation before fixes: Before running any command with system-modifying or wide-ranging effects (e.g., openclaw doctor --fix, exec/write), follow these safety steps:
1. Problem location & explanation: Explain the diagnosis result and the core issue to be fixed
2. Scope confirmation: Ask about and record the specific target or runtime environment (e.g., "This change will only affect the local development config. Do you agree?")
3. Rollback plan: Provide an executable one-click rollback command. Only proceed after the user agrees via /approve

Step 5: Finalization & Memory Update (See `docs/MODULE_04_Finalization.md`)

Save facts, lessons learned, and update state
Trigger L2 Hot-Start Query and L3 Skill Creation Suggestions

💡 Golden Path (Recommended Flow): For most OpenClaw issues, the fastest resolution path is: openclaw doctor → openclaw doctor --fix

🖼️ Diagnosis Report Visualization (v5.0)

When MRE validation fails, generate an interactive diagnostic report using canvas.snapshot() with:

Visual risk flags (🔴/🟠/🟢)
Evidence chain diagram (Doc vs GH comparison)
Exec result status codes highlighted
Rollback command code block display

🧠 Error Log Intelligent Summary (ELIS — v5.0)

When MRE fails, use LLM-powered analysis to extract root causes from exec output:

Core Issue: One-sentence summary
Possible Causes: 2–3 bullet points
Recommended Fix: Specific command(s)
Risk Level + Confidence Score

📚 Modules & Deep Dives

Consult the following categorized sub-documents for detailed process explanations:

📁 `docs/` — Core Module Documentation

MODULE_01_PreCheck.md: Problem pre-check, context collection, and security scanning.
MODULE_02_SearchChain.md: Search strategy (Docs → GitHub) with evidence chain analysis (L1).
MODULE_03_ValidationAction.md: Decision-making based on search results, choosing between direct answer, code verification, or inquiry.
MODULE_04_Finalization.md: Finalization: memory storage, lesson learning, and state updates, including L2 Hot-Start Query and L3 Skill Creation Suggestions.

📁 `docs/enhancement/` — v5.0 Enhancement Features

MODULE_03_Enhancement_Reports.md: v5.0 new modules — Diagnosis Report Visualization (DRE) + Error Log Intelligent Summary (ELIS).

📁 `docs/tutorials/` — Usage Examples

EXAMPLE_usage.md: Detailed code examples and usage scenarios.
QUICK_START_v5.0.md: v5.0 quick-start guide covering environment setup, workflow, and best practices.

📁 `docs/reports/` — Summary Reports

AUTOFIX_V5.0_SUMMARY.md: v5.0 completion report and feature summary.
CHANGES_v5.0.md: Complete change log from v4.5 to v5.0.
VERIFICATION_FINAL.md: Final integrity verification report.

📁 `scripts/` — Python/JS Tools

elis_helper.py: ELIS error log analysis helper (LLM-powered error analyzer).
canvas_report_generator.py: Canvas diagnosis report generator (HTML template rendering + URL registration).
auto_repair.py: Auto-repair library — matches diagnostic context to repair plans.
watchdog_monitor.py: Gateway Watchdog daemon — background health monitoring + dual-channel alerting.
requirements.txt: Python dependencies (pywin32).

This file is the master skill document. It defines the complete problem-solving blueprint and integrates all capability layers.