swarm-self-heal

v0.1.1

Swarm reliability watchdog for OpenClaw — validates gateway/channel and every lane, performs bounded recovery, and emits auditable receipts.

0· 461·3 current·3 all-time
byTodd Kuehnl@tkuehnl
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description align with the scripts: the files implement a passive-first swarm watchdog using the OpenClaw CLI, perform bounded recovery (gateway restart + re-probe), emit receipts, and set up cron lanes. Required binaries (bash, jq, openclaw) are appropriate.
Instruction Scope
Runtime instructions and scripts stay on task: they read OpenClaw status, call openclaw health/channels/agent commands, and install scripts into the OpenClaw workspace. They read $HOME/.openclaw/openclaw.json and cron/jobs.json (expected for cron wiring). One scope concern: setup.sh will create cron jobs that post messages via the OpenClaw Telegram channel—so telemetry/outputs will be sent externally via OpenClaw's channel if configured (see environment_proportionality).
Install Mechanism
No package download or remote install is performed by the registry metadata; the repo contains local shell scripts and the user-run setup.sh copies them into ~/.openclaw/workspace-studio/scripts and uses the OpenClaw CLI to register cron jobs. That is a low-risk, user-initiated install pattern, but it does write scripts into the user's OpenClaw workspace.
!
Credentials
The skill does not request extra credentials, but setup.sh falls back to a hard-coded Telegram recipient '8563003761' when it cannot derive a target from $HOME/.openclaw/openclaw.json. This means if the user's OpenClaw config lacks a telegram target, the skill will create cron jobs that send notifications to that number—potentially leaking incident output to an external party. The scripts also read OpenClaw config files (which may contain channel tokens) — this is expected for a notifier but should be acknowledged before installing.
Persistence & Privilege
The skill adds/edits OpenClaw cron jobs (primary and backup watchdog) so it becomes an autonomously-scheduled component via OpenClaw (normal for a watchdog). always:false, so it won't be force-included in all agents, but installing will create scheduled autonomous runs and restart the user-level openclaw-gateway service when needed. Users should be aware this gives the skill recurring execution via OpenClaw cron.
What to consider before installing
This skill appears to implement the described watchdog behavior and uses only the OpenClaw CLI plus bash/jq, but review a few things before installing: 1) Inspect setup.sh and change/remove the hard-coded Telegram fallback '8563003761' (or ensure your $HOME/.openclaw/openclaw.json contains the correct channel target) to avoid sending alerts to an unexpected recipient. 2) Installing will copy scripts into ~/.openclaw/workspace-studio/scripts and register cron jobs that run autonomously and may restart the user-level openclaw-gateway service—make a backup of OpenClaw cron/jobs.json and openclaw.json first. 3) Run the scripts manually in a controlled environment (or review outputs from check.sh) to confirm behavior before enabling cron. If you want higher assurance, ask the author for an explanation of the fallback number or a configurable opt-in for external notifications.

Like a lobster shell, security has layers — review code before you run it.

latestvk97201t3b096x4gh15q5e9228d81nbqz

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🩺 Clawdis
Binsbash, jq, openclaw

SKILL.md

When to use this skill

Use this skill when the user wants to:

  • Diagnose why a multi-agent swarm feels "stuck" or partially offline
  • Check gateway + channel + lane liveness in one run
  • Perform bounded auto-recovery (restart + retry only)
  • Capture auditable receipts for incident timelines
  • Keep a primary watchdog lane plus a backup lane in place

Commands

# Install/refresh watchdog scripts + cron wiring
bash skills/swarm-self-heal/scripts/setup.sh

# Run an immediate canary check
bash skills/swarm-self-heal/scripts/check.sh

# Run watchdog directly (uses deployed workspace path)
bash ~/.openclaw/workspace-studio/scripts/anvil_watchdog.sh

# Optional: increase lane ping timeout for slower providers
PING_TIMEOUT_SECONDS=180 bash ~/.openclaw/workspace-studio/scripts/anvil_watchdog.sh

What it checks

  • Gateway health via openclaw health
  • Channel readiness via openclaw channels status --json --probe
  • Passive lane recency via openclaw status --json (latest OpenClaw-compatible)
  • Active lane probe only when stale for main, builder-1, builder-2, reviewer, designer
  • Bounded recovery with a single restart pass + targeted re-probe of infra failures

Output contract

The watchdog output includes:

  • timestamp
  • targets
  • ok_agents
  • failed_agents
  • actions
  • VERDICT
  • RECEIPT

Safety model

  • Bounded recovery only (single restart pass per run)
  • No destructive state wipes
  • No blind reinstall behavior
  • Recovery actions are explicit in output

Notes

  • Cron wiring sets both primary and backup watchdog lanes to xhigh thinking.
  • Telegram target is auto-derived from config when available, with a safe fallback.
  • Healthy runs can be summarized as a single line to reduce operator noise.

Files

7 total
Select a file
Select a file to preview.

Comments

Loading comments…