{"skill":{"slug":"hearth","displayName":"hearth","summary":"A fast, read-only health-check sweep across every device in a homelab — ping, uptime/load, memory/disk, services, and app health, in 14 seconds with output y...","description":"---\nname: hearth\ndescription: A fast, read-only health-check sweep across every device in a homelab — ping, uptime/load, memory/disk, services, and app health, in 14 seconds with output you can scan in 30. Configuration-driven (~/.hearth/devices.yaml describes the lab; the skill is generic). Use when the user asks \"how is the lab?\", \"server status\", \"check all servers\", \"is X up?\", \"health check\", \"what's down?\", \"anything broken?\". Supports Linux, macOS, Raspberry Pi, Android (Termux/chroot), and Windows hosts (HTTP-only probe). Honest reporting — devices that can't be probed at L4 (Windows, chroots) are reported as such, never faked green. Read-only — never restarts services, never writes to remote hosts.\n---\n\n## What hearth gets you\n\n**Before hearth:** six SSH terminals open on a Friday afternoon. Type `uptime; free -h; df -h; systemctl is-active <svc1> <svc2> ...` on each box. Eight minutes in, you've forgotten what server 1 said.\n\n**With hearth:** one command, 14 seconds, every device, same format, one screen. Done.\n\n```\n=== HOMELAB — ESTATE HEALTH SWEEP ===\n=== 192.0.2.10 main-server ===\n  L1 ping:    OK\n  L2 uptime:  1 day, 2 hours, load: 0.15 0.18 0.15\n  L3 mem:     used 1.6Gi / 7.7Gi, 6.0Gi avail | disk: / 6% used, 814G free\n  L4 svc:     openclaw=active nginx=active ollama=active cron=active\n  L5 app:     gateway={\"ok\":true} | https-front=HTTP 200\n=== 192.0.2.20 fileserver ===\n  L1 ping:    OK   ...\n=== sweep complete in 14 seconds ===\n```\n\n## Why someone uses this skill\n\nThree things make hearth different from \"just SSH and check yourself\" or \"set up Prometheus\":\n\n- **Read-only by design.** Never modifies remote state. No `systemctl restart`, no `apt-get install`, no writes beyond `/tmp/.hearth_*`. Safe to run from cron, from an LLM agent, from a colleague's shell. Most monitoring tools can't make that promise.\n- **Honest about what it can't see.** When a layer can't be probed (Windows host with no SSH, chroot with no systemd), hearth says so explicitly — `unmanaged-host (no SSH)`, `no-systemd (chroot — N/A)`. It doesn't fake a green result. You always know whether a green is real or just unmeasured.\n- **Zero install on remote hosts.** No agent on every box. No `node_exporter`. No daemon. Just SSH from one bridgehead. If you can SSH to a host, hearth can probe it.\n\nThe 5-layer pattern catches the failure classes that actually hit homelabs in production:\n\n| Layer | Catches |\n|-------|---------|\n| L1 ping | Network drop, host off, ICMP blocked |\n| L2 uptime+load | Reboots, runaway load |\n| L3 mem+disk | Disk filling up before journald truncates logs, OOM-precursor leaks |\n| L4 services | Service crashed, unit name drift after distro upgrade, fail2ban banning your bridgehead |\n| L5 app | The \"service is up but returns HTTP 500 for three days\" silent-failure class |\n\n## How hearth works\n\nhearth is **configuration-driven** — the skill itself contains zero knowledge of any specific lab. The user describes their devices in `~/.hearth/devices.yaml` (or wherever `HEARTH_CONFIG` points), and hearth reads that config to drive its probes. Six device archetypes ship as worked examples (Linux+systemd, chroot/no-systemd, Raspberry Pi, Windows HTTP-only, SLURM cluster, multi-app web stack).\n\n## Triggering\n\nInvoke hearth when the user asks anything in this family:\n\n- \"server status\", \"lab status\", \"homelab status\"\n- \"check all servers\", \"check the lab\", \"check my hosts\"\n- \"is X up?\" (where X is a device name from their config)\n- \"how is the lab?\", \"how is X?\"\n- \"health check\", \"health sweep\", \"device health\"\n- \"what's running?\", \"what's down?\"\n\nIf the user names a single device, run `hearth check-device <name>` (or scope the sweep to one device with `--device <name>`).\n\n## Operation\n\nhearth is implemented as a thin wrapper around two scripts that ship with the project:\n\n- `scripts/sweep.sh` — runs the full estate sweep, or a subset\n- `scripts/check-device.sh` — runs the 5-layer probe on one device\n\nRun from the user's hearth installation directory (typically `~/hearth/`):\n\n```bash\n./scripts/sweep.sh                    # full sweep\n./scripts/sweep.sh --device <name>    # one device\n./scripts/sweep.sh --group <name>     # named group of devices\n./scripts/sweep.sh --dry-run          # validate config, no probes\n```\n\nShow the user the raw output. The output is already designed to be human-readable; do not re-summarise unless the user explicitly asks for analysis.\n\n## Output format\n\nEach device's status is printed in this exact format:\n\n```\n=== <ip-or-hostname> <name> [(<role>)] ===\n  L1 ping:    OK | UNREACHABLE\n  L2 uptime:  <duration>, load: <1m> <5m> <15m>\n  L3 mem:     used <X> / <Y>, <Z> avail | disk: / <pct>% used, <free> free\n  L4 svc:     <service1>=active <service2>=active ...\n  L5 app:     <app1>=<status> | <app2>=<status> ...\n```\n\nSpecial cases:\n\n- **`UNREACHABLE` at L1** — device fails ping. L2-L5 are skipped, sweep continues.\n- **`SSH FAILED` at L2-L4** — device pings but SSH is unresponsive. L5 may still be attempted for HTTP probes.\n- **`unmanaged-host (no SSH)` at L2-L4** — device is configured `auth: http-only` (e.g. Windows host without SSH). L5 carries the health signal.\n- **`no-systemd (chroot — N/A)` at L4** — device is a chroot or has no systemd. L2/L3 still apply, L5 carries app-health.\n\n## Triggers requiring extra care\n\n- **\"restart X\" / \"kill X\" / \"deploy X\"** — hearth is read-only. If the user asks for write actions, do NOT use hearth — explain that hearth doesn't modify remote state and ask if they want to do that another way.\n- **\"add a new device\"** — direct the user to edit `~/.hearth/devices.yaml`. Reference `examples/devices.example.yaml` and `docs/CONFIG.md` in the project for schema.\n- **\"why is X down?\"** — first run `./scripts/sweep.sh --device <X>` to confirm the failure mode, then suggest investigation paths based on which layer failed (L1 = network, L4 = services, L5 = app).\n\n## What hearth never does\n\n- **Never modify remote hosts.** No `systemctl restart`, no `apt-get install`, no file writes beyond `/tmp/.hearth_*` ephemera.\n- **Never reveal credentials.** Passwords and tokens live in env vars and SSH keys; hearth does not echo them.\n- **Never make claims it can't verify.** If L4 can't be probed (chroot, Windows), hearth says so explicitly rather than reporting a fake green.\n- **Never fabricate device data.** Every line of output comes from a real probe of a real device. If a probe times out, the output says so.\n\n## Adding hearth to a new lab\n\nIf the user has not yet set up hearth:\n\n1. Direct them to clone the repo and copy `examples/devices.example.yaml` to `~/.hearth/devices.yaml`\n2. They edit the YAML with their real devices\n3. They set credential env vars (`HEARTH_PASS_<DEVICE>`, etc.)\n4. They run `./scripts/sweep.sh --dry-run` to validate\n5. They run `./scripts/sweep.sh` for the first sweep\n\nSee `docs/INSTALL.md` for platform-specific install steps.\n\n## Adding a new device archetype\n\nIf the user has a device type not covered by the 6 ship-included archetypes (linux-systemd, linux-nosystemd-chroot, raspberry-pi, windows-http-only, slurm-cluster, magento-server), help them craft a new entry by:\n\n1. Reading `examples/archetypes/` for the closest existing match\n2. Probing the device manually with `ssh user@host 'uname -srm; uptime; systemctl list-units --type=service --state=running --no-pager | head -20'` to discover its services\n3. Adding a new device entry to their `devices.yaml`\n4. Running `./scripts/sweep.sh --device <new-name>` to test\n\nEncourage them to contribute the new archetype back upstream if it's broadly useful.\n\n## Failure modes and what to tell the user\n\n| Symptom | Likely cause | Suggested action |\n|---------|-------------|------------------|\n| L1 UNREACHABLE on a normally-reachable device | Network drop, host powered off | Check physical/UPS, check switch, ping the gateway |\n| SSH FAILED but L1 OK | SSH daemon down, firewall, fail2ban ban | SSH manually from another host to confirm |\n| L4 service shows `inactive` for a service the user expects active | Service crashed, unit name wrong | `journalctl -u <unit>` on the device |\n| L5 HTTP probe shows `HTTP 000` | App is down or port closed | `curl -v <url>` from the bridgehead |\n| L5 HTTP probe shows `HTTP 502/503` | App is up but failing | Check app's own logs |\n| Sweep takes >30s for 10 devices | One device is timing out | Re-run with `--device <name>` to isolate |\n\n## Privacy\n\nhearth is designed to be safe to run in a public/agentic context:\n\n- Reads only the user's own config file (no broader filesystem snooping)\n- Writes only to `/tmp/.hearth_*` (cleaned up immediately)\n- Does NOT log device IPs, hostnames, or output to any remote service\n- Does NOT include telemetry of any kind\n\nIf asked about specific configuration values (passwords, tokens), hearth does NOT have access to those — they're in the user's env vars, only readable by the running process when invoking SSH/curl.\n\n## Version\n\n0.1.4 — schema-only example.yaml; app probes documented per-archetype. OpenClaw skill mode.","tags":{"health-check":"0.1.5","homelab":"0.1.5","latest":"0.1.5","monitoring":"0.1.5","read-only":"0.1.5","sysadmin":"0.1.5"},"stats":{"comments":0,"downloads":417,"installsAllTime":0,"installsCurrent":0,"stars":1,"versions":6},"createdAt":1777808247616,"updatedAt":1778492833926},"latestVersion":{"version":"0.1.5","createdAt":1777815686545,"changelog":"v0.1.5 - Documentation only. Rewrote SKILL.md frontmatter description and body for sales clarity (the listing summary is now the punchier 'What hearth gets you' pitch instead of the dry mechanism description). Added a comprehensive PrePublishGate regex-layer security-scan CI workflow that catches 35+ leak patterns on every push and PR. No functional or behavioural changes - skill output and config schema are identical to v0.1.4.","license":"MIT-0"},"metadata":null,"owner":{"handle":"nj070574-gif","userId":"s1747h3dssx5wbb4xxpn85vtsd83gnax","displayName":"Only 1 Naren","image":"https://avatars.githubusercontent.com/u/233563303?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1780090736146}}