hearth

Health

A fast, read-only health-check sweep across every device in a homelab — ping, uptime/load, memory/disk, services, and app health, in 14 seconds with output you can scan in 30. Configuration-driven (~/.hearth/devices.yaml describes the lab; the skill is generic). Use when the user asks "how is the lab?", "server status", "check all servers", "is X up?", "health check", "what's down?", "anything broken?". Supports Linux, macOS, Raspberry Pi, Android (Termux/chroot), and Windows hosts (HTTP-only probe). Honest reporting — devices that can't be probed at L4 (Windows, chroots) are reported as such, never faked green. Read-only — never restarts services, never writes to remote hosts.

Install

openclaw skills install @nj070574-gif/hearth

What hearth gets you

Before hearth: six SSH terminals open on a Friday afternoon. Type uptime; free -h; df -h; systemctl is-active <svc1> <svc2> ... on each box. Eight minutes in, you've forgotten what server 1 said.

With hearth: one command, 14 seconds, every device, same format, one screen. Done.

=== HOMELAB — ESTATE HEALTH SWEEP ===
=== 192.0.2.10 main-server ===
  L1 ping:    OK
  L2 uptime:  1 day, 2 hours, load: 0.15 0.18 0.15
  L3 mem:     used 1.6Gi / 7.7Gi, 6.0Gi avail | disk: / 6% used, 814G free
  L4 svc:     openclaw=active nginx=active ollama=active cron=active
  L5 app:     gateway={"ok":true} | https-front=HTTP 200
=== 192.0.2.20 fileserver ===
  L1 ping:    OK   ...
=== sweep complete in 14 seconds ===

Why someone uses this skill

Three things make hearth different from "just SSH and check yourself" or "set up Prometheus":

Read-only by design. Never modifies remote state. No systemctl restart, no apt-get install, no writes beyond /tmp/.hearth_*. Safe to run from cron, from an LLM agent, from a colleague's shell. Most monitoring tools can't make that promise.
Honest about what it can't see. When a layer can't be probed (Windows host with no SSH, chroot with no systemd), hearth says so explicitly — unmanaged-host (no SSH), no-systemd (chroot — N/A). It doesn't fake a green result. You always know whether a green is real or just unmeasured.
Zero install on remote hosts. No agent on every box. No node_exporter. No daemon. Just SSH from one bridgehead. If you can SSH to a host, hearth can probe it.

The 5-layer pattern catches the failure classes that actually hit homelabs in production:

Layer	Catches
L1 ping	Network drop, host off, ICMP blocked
L2 uptime+load	Reboots, runaway load
L3 mem+disk	Disk filling up before journald truncates logs, OOM-precursor leaks
L4 services	Service crashed, unit name drift after distro upgrade, fail2ban banning your bridgehead
L5 app	The "service is up but returns HTTP 500 for three days" silent-failure class

How hearth works

hearth is configuration-driven — the skill itself contains zero knowledge of any specific lab. The user describes their devices in ~/.hearth/devices.yaml (or wherever HEARTH_CONFIG points), and hearth reads that config to drive its probes. Six device archetypes ship as worked examples (Linux+systemd, chroot/no-systemd, Raspberry Pi, Windows HTTP-only, SLURM cluster, multi-app web stack).

Triggering

Invoke hearth when the user asks anything in this family:

"server status", "lab status", "homelab status"
"check all servers", "check the lab", "check my hosts"
"is X up?" (where X is a device name from their config)
"how is the lab?", "how is X?"
"health check", "health sweep", "device health"
"what's running?", "what's down?"

If the user names a single device, run hearth check-device <name> (or scope the sweep to one device with --device <name>).

Operation

hearth is implemented as a thin wrapper around two scripts that ship with the project:

scripts/sweep.sh — runs the full estate sweep, or a subset
scripts/check-device.sh — runs the 5-layer probe on one device

Run from the user's hearth installation directory (typically ~/hearth/):

./scripts/sweep.sh                    # full sweep
./scripts/sweep.sh --device <name>    # one device
./scripts/sweep.sh --group <name>     # named group of devices
./scripts/sweep.sh --dry-run          # validate config, no probes

Show the user the raw output. The output is already designed to be human-readable; do not re-summarise unless the user explicitly asks for analysis.

Output format

Each device's status is printed in this exact format:

=== <ip-or-hostname> <name> [(<role>)] ===
  L1 ping:    OK | UNREACHABLE
  L2 uptime:  <duration>, load: <1m> <5m> <15m>
  L3 mem:     used <X> / <Y>, <Z> avail | disk: / <pct>% used, <free> free
  L4 svc:     <service1>=active <service2>=active ...
  L5 app:     <app1>=<status> | <app2>=<status> ...

Special cases:

UNREACHABLE at L1 — device fails ping. L2-L5 are skipped, sweep continues.
SSH FAILED at L2-L4 — device pings but SSH is unresponsive. L5 may still be attempted for HTTP probes.
unmanaged-host (no SSH) at L2-L4 — device is configured auth: http-only (e.g. Windows host without SSH). L5 carries the health signal.
no-systemd (chroot — N/A) at L4 — device is a chroot or has no systemd. L2/L3 still apply, L5 carries app-health.

Triggers requiring extra care

"restart X" / "kill X" / "deploy X" — hearth is read-only. If the user asks for write actions, do NOT use hearth — explain that hearth doesn't modify remote state and ask if they want to do that another way.
"add a new device" — direct the user to edit ~/.hearth/devices.yaml. Reference examples/devices.example.yaml and docs/CONFIG.md in the project for schema.
"why is X down?" — first run ./scripts/sweep.sh --device <X> to confirm the failure mode, then suggest investigation paths based on which layer failed (L1 = network, L4 = services, L5 = app).

What hearth never does

Never modify remote hosts. No systemctl restart, no apt-get install, no file writes beyond /tmp/.hearth_* ephemera.
Never reveal credentials. Passwords and tokens live in env vars and SSH keys; hearth does not echo them.
Never make claims it can't verify. If L4 can't be probed (chroot, Windows), hearth says so explicitly rather than reporting a fake green.
Never fabricate device data. Every line of output comes from a real probe of a real device. If a probe times out, the output says so.

Adding hearth to a new lab

If the user has not yet set up hearth:

Direct them to clone the repo and copy examples/devices.example.yaml to ~/.hearth/devices.yaml
They edit the YAML with their real devices
They set credential env vars (HEARTH_PASS_<DEVICE>, etc.)
They run ./scripts/sweep.sh --dry-run to validate
They run ./scripts/sweep.sh for the first sweep

See docs/INSTALL.md for platform-specific install steps.

Adding a new device archetype

If the user has a device type not covered by the 6 ship-included archetypes (linux-systemd, linux-nosystemd-chroot, raspberry-pi, windows-http-only, slurm-cluster, magento-server), help them craft a new entry by:

Reading examples/archetypes/ for the closest existing match
Probing the device manually with ssh user@host 'uname -srm; uptime; systemctl list-units --type=service --state=running --no-pager | head -20' to discover its services
Adding a new device entry to their devices.yaml
Running ./scripts/sweep.sh --device <new-name> to test

Encourage them to contribute the new archetype back upstream if it's broadly useful.

Failure modes and what to tell the user

Symptom	Likely cause	Suggested action
L1 UNREACHABLE on a normally-reachable device	Network drop, host powered off	Check physical/UPS, check switch, ping the gateway
SSH FAILED but L1 OK	SSH daemon down, firewall, fail2ban ban	SSH manually from another host to confirm
L4 service shows `inactive` for a service the user expects active	Service crashed, unit name wrong	`journalctl -u <unit>` on the device
L5 HTTP probe shows `HTTP 000`	App is down or port closed	`curl -v <url>` from the bridgehead
L5 HTTP probe shows `HTTP 502/503`	App is up but failing	Check app's own logs
Sweep takes >30s for 10 devices	One device is timing out	Re-run with `--device <name>` to isolate

Privacy

hearth is designed to be safe to run in a public/agentic context:

Reads only the user's own config file (no broader filesystem snooping)
Writes only to /tmp/.hearth_* (cleaned up immediately)
Does NOT log device IPs, hostnames, or output to any remote service
Does NOT include telemetry of any kind

If asked about specific configuration values (passwords, tokens), hearth does NOT have access to those — they're in the user's env vars, only readable by the running process when invoking SSH/curl.

Version

0.1.4 — schema-only example.yaml; app probes documented per-archetype. OpenClaw skill mode.