swarm-self-heal
v0.1.1Swarm reliability watchdog for OpenClaw — validates gateway/channel and every lane, performs bounded recovery, and emits auditable receipts.
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
Name/description align with the scripts: the files implement a passive-first swarm watchdog using the OpenClaw CLI, perform bounded recovery (gateway restart + re-probe), emit receipts, and set up cron lanes. Required binaries (bash, jq, openclaw) are appropriate.
Instruction Scope
Runtime instructions and scripts stay on task: they read OpenClaw status, call openclaw health/channels/agent commands, and install scripts into the OpenClaw workspace. They read $HOME/.openclaw/openclaw.json and cron/jobs.json (expected for cron wiring). One scope concern: setup.sh will create cron jobs that post messages via the OpenClaw Telegram channel—so telemetry/outputs will be sent externally via OpenClaw's channel if configured (see environment_proportionality).
Install Mechanism
No package download or remote install is performed by the registry metadata; the repo contains local shell scripts and the user-run setup.sh copies them into ~/.openclaw/workspace-studio/scripts and uses the OpenClaw CLI to register cron jobs. That is a low-risk, user-initiated install pattern, but it does write scripts into the user's OpenClaw workspace.
Credentials
The skill does not request extra credentials, but setup.sh falls back to a hard-coded Telegram recipient '8563003761' when it cannot derive a target from $HOME/.openclaw/openclaw.json. This means if the user's OpenClaw config lacks a telegram target, the skill will create cron jobs that send notifications to that number—potentially leaking incident output to an external party. The scripts also read OpenClaw config files (which may contain channel tokens) — this is expected for a notifier but should be acknowledged before installing.
Persistence & Privilege
The skill adds/edits OpenClaw cron jobs (primary and backup watchdog) so it becomes an autonomously-scheduled component via OpenClaw (normal for a watchdog). always:false, so it won't be force-included in all agents, but installing will create scheduled autonomous runs and restart the user-level openclaw-gateway service when needed. Users should be aware this gives the skill recurring execution via OpenClaw cron.
What to consider before installing
This skill appears to implement the described watchdog behavior and uses only the OpenClaw CLI plus bash/jq, but review a few things before installing: 1) Inspect setup.sh and change/remove the hard-coded Telegram fallback '8563003761' (or ensure your $HOME/.openclaw/openclaw.json contains the correct channel target) to avoid sending alerts to an unexpected recipient. 2) Installing will copy scripts into ~/.openclaw/workspace-studio/scripts and register cron jobs that run autonomously and may restart the user-level openclaw-gateway service—make a backup of OpenClaw cron/jobs.json and openclaw.json first. 3) Run the scripts manually in a controlled environment (or review outputs from check.sh) to confirm behavior before enabling cron. If you want higher assurance, ask the author for an explanation of the fallback number or a configurable opt-in for external notifications.Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
🩺 Clawdis
Binsbash, jq, openclaw
SKILL.md
When to use this skill
Use this skill when the user wants to:
- Diagnose why a multi-agent swarm feels "stuck" or partially offline
- Check gateway + channel + lane liveness in one run
- Perform bounded auto-recovery (restart + retry only)
- Capture auditable receipts for incident timelines
- Keep a primary watchdog lane plus a backup lane in place
Commands
# Install/refresh watchdog scripts + cron wiring
bash skills/swarm-self-heal/scripts/setup.sh
# Run an immediate canary check
bash skills/swarm-self-heal/scripts/check.sh
# Run watchdog directly (uses deployed workspace path)
bash ~/.openclaw/workspace-studio/scripts/anvil_watchdog.sh
# Optional: increase lane ping timeout for slower providers
PING_TIMEOUT_SECONDS=180 bash ~/.openclaw/workspace-studio/scripts/anvil_watchdog.sh
What it checks
- Gateway health via
openclaw health - Channel readiness via
openclaw channels status --json --probe - Passive lane recency via
openclaw status --json(latest OpenClaw-compatible) - Active lane probe only when stale for
main,builder-1,builder-2,reviewer,designer - Bounded recovery with a single restart pass + targeted re-probe of infra failures
Output contract
The watchdog output includes:
timestamptargetsok_agentsfailed_agentsactionsVERDICTRECEIPT
Safety model
- Bounded recovery only (single restart pass per run)
- No destructive state wipes
- No blind reinstall behavior
- Recovery actions are explicit in output
Notes
- Cron wiring sets both primary and backup watchdog lanes to
xhighthinking. - Telegram target is auto-derived from config when available, with a safe fallback.
- Healthy runs can be summarized as a single line to reduce operator noise.
Files
7 totalSelect a file
Select a file to preview.
Comments
Loading comments…
