Nas Dashboard

Generates a unified text-based NAS dashboard with system health, ZFS pools, disk SMART, Docker containers, GPU status, UPS, backup services, and security. Use when the user asks for NAS status, system overview, dashboard, or wants a comprehensive system monitoring report. Linux/ZFS focused; best on Ubuntu/Debian with ZFS+NUT+Docker. Also triggers on "文本仪表盘", "nas status", "server status", "system dashboard".

Audits

Pending

Install

openclaw skills install nas-dashboard

NAS Dashboard v2

Exception-driven, alert-first text dashboard for NAS/HomeLab monitoring. Prioritises problems over green lights — if everything is healthy you get a one-liner, if something is wrong you get a surgical report.

Platform: Linux (Ubuntu/Debian tested). Partial support: any Linux with ZFS. macOS/Windows: most sections unavailable (ZFS, SMART, systemd, apt, iostat, sensors are Linux-only).

Design Principles

  1. Alerts First: ❌/⚠️ always on top. No alerts → no ALERTS section.
  2. Normal is Invisible: Don't list 15 healthy containers. Collapse to "X running (Y OK)".
  3. Contextual Correlation: Disk ↔ ZFS vdev, container unhealthy + CPU process = zombie.
  4. Actionable Thresholds: Timeshift >100 warns, SMART realloc>0 alarms, disk>45°C flags.
  5. Unified Timestamps: All times in YYYY-MM-DD HH:MM or XdXh relative.

Quick Start

Configure (optional)

export ZPOOL="tank"              # default: auto-detect first pool
export DISK_LIST="sda sdb sdc"   # default: auto-detect all /dev/sd?
export FRIGATE_CAM_MAP="cam_d82e8e00:客厅,cam_ae7e3010:门口,cam_a24a20c0:车库"
export UPS_NAME="ups@localhost"  # default: ups@localhost

Run

bash scripts/collect.sh

Cron setup

openclaw cron add \
  --name "NAS仪表盘" \
  --schedule "0 9 * * *" \
  --agent main \
  --timeout 180 \
  --delivery "announce:telegram:YOUR_CHAT_ID" \
  --prompt "Run nas-dashboard skill: collect and format the dashboard report, then send to Telegram."

Workflow

1. Collect data

Run scripts/collect.sh. Sections: SYSTEM, ZFS, VDEV_DISK (disk→pool mapping), DISKS (incl. SMART realloc/pending/udma), DISKIO, DOCKER, FRIGATE, GPU, NETWORK, PROCESSES, SERVICES, LOGS, SHARES, SECURITY, UPDATES, BOOT, UPS, TIMESHIFT.

2. Format output — Alert-First Layout

Step 0: Scan for Alerts (always do this first)

Scan ALL collected data against these thresholds and classify severity:

ScopeConditionSeverity
Pool health ≠ ONLINEany pool❌ CRITICAL
Scrub errors > 0any pool❌ CRITICAL
ZFS_EVENT presentany event⚠️ WARNING
Disk health ≠ PASSEDany disk❌ CRITICAL
Disk realloc > 0any disk❌ CRITICAL
Disk pending > 0any disk⚠️ WARNING
Disk udma_crc > 100any disk⚠️ WARNING
Disk temp > 45°Cany disk⚠️ WARNING
Disk temp > 55°Cany disk❌ CRITICAL
Disk r_await > 20ms or w_await > 20msany disk⚠️ WARNING
Docker container Down/Unhealthyany ctr⚠️ WARNING
Frigate camera skip > 1.0any camera⚠️ WARNING
Frigate camera fps = 0any camera❌ CRITICAL
Frigate storage > 80%any storage⚠️ WARNING
Frigate storage > 95%any storage❌ CRITICAL
Timeshift count > 100ts active⚠️ WARNING
Timeshift count > 300ts active❌ CRITICAL
CPU temp > 70°C⚠️ WARNING
CPU temp > 85°C❌ CRITICAL
GPU temp > 75°C⚠️ WARNING
GPU temp > 85°C❌ CRITICAL
ARC hit rate < 90%⚠️ WARNING
ZFS capacity > 80%⚠️ WARNING
ZFS capacity > 90%❌ CRITICAL
Root disk > 85%⚠️ WARNING
Root disk > 95%❌ CRITICAL
Failed logins > 0today⚠️ WARNING
Failed systemd servicesany⚠️ WARNING
UPS ≠ OL (not Online)⚠️ WARNING
UPS battery < 50%⚠️ WARNING
OOM events present⚠️ WARNING
APT updates availableℹ️ INFO

If 0 alerts: skip the 🚨 风险预警 section entirely.

If alerts exist: build 🚨 风险预警 section listing every alert, grouped by severity (❌ first, then ⚠️, then ℹ️). Format:

🚨 风险预警
❌ {description}
⚠️ {description}

Step 1: Build ALERTS Section (if any)

Sort by severity: ❌ CRITICAL → ⚠️ WARNING → ℹ️ INFO. One line per alert. Examples:

  • ❌ sde (tank) Reallocated_Sector_Ct: 5 — 坏道增长,建议立即更换
  • ⚠️ cam_a24a20c0 skip:2.4 — 解码丢帧,检查 GPU 或降低分辨率
  • ⚠️ nextcloud (Docker) unhealthy — 容器异常
  • ⚠️ nfs-server (service) inactive — NFS 服务未运行
  • ⚠️ Timeshift: 430 snaps — 过多,建议清理 (>100)
  • ⚠️ sde udma_crc:29 — SATA 链路错误
  • ℹ️ 0 APT updates pending

Correlation rules (apply these when building alerts):

  • If a Docker container is unhealthy AND a process with the same name has >1% CPU → append (疑似僵死进程,建议重启容器)
  • If a Frigate camera has skip>1.0 AND GPU utilization is high → append (GPU 编码瓶颈)
  • If a disk has udma>0 → mention potential SATA cable issue

Step 2: Build Dashboard Body

Use this compact layout. Omit sections entirely if no data or all is healthy and not noteworthy.

╭──────────────────────────────────╮
│  🏠 NAS Dashboard · {YYYY-MM-DD (周X)}  │
╰──────────────────────────────────╯

Then sections in order:

🖥 SYSTEM — one line:

🖥 {hostname} · {OS_short} · up {uptime_simplified} · load:{load_1min}
   CPU:{cpu_used%} ██████░░░░ · RAM:{mem_used}/{mem_total} ({mem_pct}) · / {root_used}/{root_total} ({root_pct})
   CPU:{cpu_temp°C} · Mobo:{hottest_mobo_temp}°C
  • uptime_simplified: convert "1 week, 2 days, 18 hours" → "1w2d18h"
  • OS_short: "Ubuntu 24.04" from "Ubuntu 24.04.4 LTS"
  • Only show the single highest mobo temp (not all sensors)
  • CPU progress bar: 10 chars, cpu_used/10 → count

🗄 ZFS — pool summary line + ARC line:

🗄 {pool} [{health_emoji} {health}] · {alloc}T/{size}T ({cap}%) ██████░░░░ · frag:{frag}%
   ARC:{arc_size}GiB/{arc_max}GiB · hit:{arc_hit}% · Scrub:{scrub_summary}
  • Scrub summary: extract "repaired 0B, 0 errors" and date from SCRUB field
  • If L2ARC size > 0: append · L2ARC:{l2_size}GiB hit:{l2_hit}%
  • ZFS capacity bar: cap/10 → count
  • If ARC hit < 90%: add ⚠️ prefix
  • Snapshots: Snaps:{count} latest:{yyyy-mm-dd}

💾 DISKS — fixed-width column layout, one line per disk.

Use a mini-table with separators so all status emoji align vertically:

💾 DISKS ───────────────────────────────────
sda (tank) │ W1003ABYZ-011FA0 │  931G │ 42°C │ 10909h │              ✅
sdc (tank) │ WD10PURX-78D85Y0 │  931G │ 39°C │  6804h │ r_await:10ms ✅
sde (tank) │ ST1000DM003-1ER16 │  931G │ 36°C │ 10223h │  udma:29     ✅

Column widths (pad/crop each field to fit):

ColFieldWidthAlign
1{disk} ({pool_role})9left
2model name18left, truncate if longer
3size6right
4temp5right
5hours7right
6alerts + status14right
  • Col 6 (alerts + status): padding is dynamic but right-aligned. Contents:
    • realloc:X if > 0 (else pad)
    • pending:X if > 0 (else pad)
    • udma:X if > 0 (else pad)
    • r_await:Xms if > 5ms (else pad)
    • Always end with (PASSED) or (FAIL)
  • Use VDEV_DISK data to annotate pool role: sda (tank), sdb (tank-cache) etc.
  • Model: use full model string, crop to 18 chars if longer
  • ⚠️ prefix the whole line if temp>45°C or realloc>0 or pending>0 or udma>100
  • Serial is omitted (model provides enough identification for this view)

Disk I/O — only show disks with util>5% or await>10ms:

   IO: sda r2.5/w4.6ms util5.2% · sdc r10.5/w3.3ms

🐳 DOCKER — converged view:

🐳 {total} running ({healthy_count} healthy) · v{docker_ver} · {image_count} imgs · {volume_gb}GB

Then only list unhealthy containers explicitly:

   ⚠️ nextcloud: Up 2 days (no healthcheck)
   ⚠️ xunlei: Up 2 days (no healthcheck) [CPU 11.2% — 疑似僵死]
  • healthy_count: count of containers with "(healthy)" in status
  • List containers WITHOUT "(healthy)" suffix under ⚠️
  • If a container appears in TOP_CPU with same name → add [CPU X% — 疑似僵死]
  • Images: extract total size and reclaimable from DOCKER_DF
  • If reclaimable > 10GB: · {reclaimable} reclaimable ⚠️
  • If ALL containers are healthy: omit the detail lines, just show the summary

📹 FRIGATE — cameras, only expand problem ones:

📹 3 cams · detection:{det_fps}fps · infer:{infer_ms}ms
   ✅ cam_d82e8e00: 5.1fps · ✅ cam_ae7e3010: 5.1fps
   ⚠️ cam_a24a20c0: 4.9fps · skip:2.4 (丢帧 49%)
  • Map camera IDs to friendly names via FRIGATE_CAM_MAP env var
  • Show ALL camera names with fps (even healthy, but compact inline)
  • Cameras with skip>0.5: show skip value + calculated drop percentage (skip/fps*100)
  • Cameras with skip≤0.5: just show ✅ name: fps
  • Storage: 📀 {path}: {used}G/{total}G ({pct}%) for each FRIGATE_STORAGE line. ⚠️ if >80%.
  • If Frigate unreachable: 📹 Frigate: no response ❌

🎮 GPU — one line:

🎮 {gpu_model} · {temp}°C · {util}% · VRAM:{used}M/{total}M · {proc_count} procs
  • VRAM bar: 10 chars proportional
  • Omit GPU entirely if nvidia-smi not available or no GPU detected

🌐 NETWORK — one line per active interface:

🌐 enp4s0: {ip} · ↓{total_rx} ↑{total_tx}
  • Skip DOWN interfaces
  • Traffic: use TRAFFIC data, convert to human-readable (GB/MB)

📊 PROCESSES — top 3 CPU only (compact):

📊 CPU: xunlei 11.2% · ffmpeg 3.6% · python3 3.1%
   MEM: python3 3.6% · node 2.8% · gnome-shell 1.4%

⚙️ SERVICES — only show non-active or failed:

⚙️ ⚠️ nfs-server: inactive · 1 failed unit: snap.firmware-updater
  • If all services active and no failed units: omit this section entirely

🔒 SECURITY — compact:

🔒 Failed logins: {count} · Boot: {boot_time_YYYY-MM-DD HH:MM} ({Xd} ago)
   Last: {user} from {ip} at {YYYY-MM-DD HH:MM}
  • Convert LAST_LOGINS format: extract latest entry, reformat to "user from ip at YYYY-MM-DD HH:MM"
  • "Sun May 17 22:02" → "2026-05-17 22:02"
  • If failed_logins = 0 → Failed logins: 0

🔋 UPS — one line:

🔋 {status_icon} {status_text} · charge:{batt_charge}% · load:{ups_load}% · in:{input_v}V · batt:{batt_v}V
  • OL → ⚡Online, OB → 🪫Battery, OB DISCHRG → 🪫Discharging

💾 TIMESHIFT — one line with health check:

💾 Timeshift: {count} snaps · latest:{YYYY-MM-DD HH:MM}
  • If 100 < count ≤ 300: append ⚠️ 过多,建议清理
  • If count > 300: append ❌ 严重过多 (>300),立即清理!

📦 UPDATES — only if > 0:

📦 {count} APT updates available

🔧 OOM / Logs — only if data present:

🔧 OOM: {oom_line_truncated}
   Logs(1h): {error_count} errors

Step 3: Final Assembly

  1. Title line
  2. 🚨 ALERTS (if any)
  3. ━━━━━━━━━━━━━━━━━━━━━━━━ divider
  4. All body sections in order
  5. Keep total output under 2500 chars

3. Deliver

Use message tool with action=send to the target channel.

Prerequisites

ToolRequired forPackage
zpool/zfsZFS sectionzfsutils-linux
smartctlDisk healthsmartmontools
dockerDocker sectiondocker-ce
nvidia-smiGPU sectionnvidia-driver
iostatDisk I/Osysstat
sensorsTemperatureslm-sensors
upscUPS sectionnut-client
journalctlLogssystemd (built-in)

SMART, auth.log, and zpool events need sudo -n (passwordless sudo). Sections degrade gracefully if unavailable.

Threshold Reference

Metric⚠️ Warning❌ Critical
Disk temp>45°C>55°C
CPU temp>70°C>85°C
GPU temp>75°C>85°C
ZFS capacity>80%>90%
Root disk>85%>95%
ARC hit rate<90%
Disk r_await/w_await>20ms
Frigate skip>1.0fps>3.0fps
Frigate storage>80%>95%
Frigate camera fps<1.0=0
Timeshift snaps>100>300
realloc (SMART 5)>0
pending (SMART 197)>0>10
udma_crc (SMART 199)>100>1000
UPS battery<50%<20%
Disk I/O util>50%>80%

Notes

  • All personal data (hostname, IPs, disk serials) is read at runtime, not hardcoded.
  • Camera name mapping is configurable via FRIGATE_CAM_MAP env var: cam_id:Name,cam_id:Name
  • Pool name auto-detected from zpool list. Override with ZPOOL env var.
  • Disk list auto-detected from lsblk. Override with DISK_LIST env var.
  • Suitable for publishing to ClawHub — contains no credentials, tokens, or fixed identifiers.