Install
openclaw skills install @nj070574-gif/hearthA fast, read-only health-check sweep across every device in a homelab — ping, uptime/load, memory/disk, services, and app health, in 14 seconds with output you can scan in 30. Configuration-driven (~/.hearth/devices.yaml describes the lab; the skill is generic). Use when the user asks "how is the lab?", "server status", "check all servers", "is X up?", "health check", "what's down?", "anything broken?". Supports Linux, macOS, Raspberry Pi, Android (Termux/chroot), and Windows hosts (HTTP-only probe). Honest reporting — devices that can't be probed at L4 (Windows, chroots) are reported as such, never faked green. Read-only — never restarts services, never writes to remote hosts.
openclaw skills install @nj070574-gif/hearthBefore hearth: six SSH terminals open on a Friday afternoon. Type uptime; free -h; df -h; systemctl is-active <svc1> <svc2> ... on each box. Eight minutes in, you've forgotten what server 1 said.
With hearth: one command, 14 seconds, every device, same format, one screen. Done.
=== HOMELAB — ESTATE HEALTH SWEEP ===
=== 192.0.2.10 main-server ===
L1 ping: OK
L2 uptime: 1 day, 2 hours, load: 0.15 0.18 0.15
L3 mem: used 1.6Gi / 7.7Gi, 6.0Gi avail | disk: / 6% used, 814G free
L4 svc: openclaw=active nginx=active ollama=active cron=active
L5 app: gateway={"ok":true} | https-front=HTTP 200
=== 192.0.2.20 fileserver ===
L1 ping: OK ...
=== sweep complete in 14 seconds ===
Three things make hearth different from "just SSH and check yourself" or "set up Prometheus":
systemctl restart, no apt-get install, no writes beyond /tmp/.hearth_*. Safe to run from cron, from an LLM agent, from a colleague's shell. Most monitoring tools can't make that promise.unmanaged-host (no SSH), no-systemd (chroot — N/A). It doesn't fake a green result. You always know whether a green is real or just unmeasured.node_exporter. No daemon. Just SSH from one bridgehead. If you can SSH to a host, hearth can probe it.The 5-layer pattern catches the failure classes that actually hit homelabs in production:
| Layer | Catches |
|---|---|
| L1 ping | Network drop, host off, ICMP blocked |
| L2 uptime+load | Reboots, runaway load |
| L3 mem+disk | Disk filling up before journald truncates logs, OOM-precursor leaks |
| L4 services | Service crashed, unit name drift after distro upgrade, fail2ban banning your bridgehead |
| L5 app | The "service is up but returns HTTP 500 for three days" silent-failure class |
hearth is configuration-driven — the skill itself contains zero knowledge of any specific lab. The user describes their devices in ~/.hearth/devices.yaml (or wherever HEARTH_CONFIG points), and hearth reads that config to drive its probes. Six device archetypes ship as worked examples (Linux+systemd, chroot/no-systemd, Raspberry Pi, Windows HTTP-only, SLURM cluster, multi-app web stack).
Invoke hearth when the user asks anything in this family:
If the user names a single device, run hearth check-device <name> (or scope the sweep to one device with --device <name>).
hearth is implemented as a thin wrapper around two scripts that ship with the project:
scripts/sweep.sh — runs the full estate sweep, or a subsetscripts/check-device.sh — runs the 5-layer probe on one deviceRun from the user's hearth installation directory (typically ~/hearth/):
./scripts/sweep.sh # full sweep
./scripts/sweep.sh --device <name> # one device
./scripts/sweep.sh --group <name> # named group of devices
./scripts/sweep.sh --dry-run # validate config, no probes
Show the user the raw output. The output is already designed to be human-readable; do not re-summarise unless the user explicitly asks for analysis.
Each device's status is printed in this exact format:
=== <ip-or-hostname> <name> [(<role>)] ===
L1 ping: OK | UNREACHABLE
L2 uptime: <duration>, load: <1m> <5m> <15m>
L3 mem: used <X> / <Y>, <Z> avail | disk: / <pct>% used, <free> free
L4 svc: <service1>=active <service2>=active ...
L5 app: <app1>=<status> | <app2>=<status> ...
Special cases:
UNREACHABLE at L1 — device fails ping. L2-L5 are skipped, sweep continues.SSH FAILED at L2-L4 — device pings but SSH is unresponsive. L5 may still be attempted for HTTP probes.unmanaged-host (no SSH) at L2-L4 — device is configured auth: http-only (e.g. Windows host without SSH). L5 carries the health signal.no-systemd (chroot — N/A) at L4 — device is a chroot or has no systemd. L2/L3 still apply, L5 carries app-health.~/.hearth/devices.yaml. Reference examples/devices.example.yaml and docs/CONFIG.md in the project for schema../scripts/sweep.sh --device <X> to confirm the failure mode, then suggest investigation paths based on which layer failed (L1 = network, L4 = services, L5 = app).systemctl restart, no apt-get install, no file writes beyond /tmp/.hearth_* ephemera.If the user has not yet set up hearth:
examples/devices.example.yaml to ~/.hearth/devices.yamlHEARTH_PASS_<DEVICE>, etc.)./scripts/sweep.sh --dry-run to validate./scripts/sweep.sh for the first sweepSee docs/INSTALL.md for platform-specific install steps.
If the user has a device type not covered by the 6 ship-included archetypes (linux-systemd, linux-nosystemd-chroot, raspberry-pi, windows-http-only, slurm-cluster, magento-server), help them craft a new entry by:
examples/archetypes/ for the closest existing matchssh user@host 'uname -srm; uptime; systemctl list-units --type=service --state=running --no-pager | head -20' to discover its servicesdevices.yaml./scripts/sweep.sh --device <new-name> to testEncourage them to contribute the new archetype back upstream if it's broadly useful.
| Symptom | Likely cause | Suggested action |
|---|---|---|
| L1 UNREACHABLE on a normally-reachable device | Network drop, host powered off | Check physical/UPS, check switch, ping the gateway |
| SSH FAILED but L1 OK | SSH daemon down, firewall, fail2ban ban | SSH manually from another host to confirm |
L4 service shows inactive for a service the user expects active | Service crashed, unit name wrong | journalctl -u <unit> on the device |
L5 HTTP probe shows HTTP 000 | App is down or port closed | curl -v <url> from the bridgehead |
L5 HTTP probe shows HTTP 502/503 | App is up but failing | Check app's own logs |
| Sweep takes >30s for 10 devices | One device is timing out | Re-run with --device <name> to isolate |
hearth is designed to be safe to run in a public/agentic context:
/tmp/.hearth_* (cleaned up immediately)If asked about specific configuration values (passwords, tokens), hearth does NOT have access to those — they're in the user's env vars, only readable by the running process when invoking SSH/curl.
0.1.4 — schema-only example.yaml; app probes documented per-archetype. OpenClaw skill mode.