Pilot Fleet Health Monitor Setup

MCP Tools

Deploy a fleet health monitoring system with 3 agents. Use this skill when: 1. User wants to set up fleet or server health monitoring 2. User is configuring an agent as part of a health monitoring setup 3. User asks about monitoring, alerting, or metrics collection across agents Do NOT use this skill when: - User wants a single health check (use pilot-health instead) - User wants to send a one-off alert (use pilot-alert instead)

Install

openclaw skills install pilot-fleet-health-monitor-setup

Fleet Health Monitor Setup

Deploy 3 agents that monitor server health and aggregate alerts.

Roles

RoleHostnameSkillsPurpose
web-monitor<prefix>-web-monitorpilot-health, pilot-alert, pilot-metricsMonitors web servers, publishes health alerts
db-monitor<prefix>-db-monitorpilot-health, pilot-alert, pilot-metricsMonitors databases, publishes health alerts
alert-hub<prefix>-alert-hubpilot-webhook-bridge, pilot-alert, pilot-event-filter, pilot-slack-bridgeAggregates alerts, forwards to humans

Setup Procedure

Step 1: Ask the user which role this agent should play and what prefix to use.

Step 2: Install the skills for the chosen role:

# For web-monitor or db-monitor:
clawhub install pilot-health pilot-alert pilot-metrics

# For alert-hub:
clawhub install pilot-webhook-bridge pilot-alert pilot-event-filter pilot-slack-bridge

Step 3: Set the hostname:

pilotctl --json set-hostname <prefix>-<role>

Step 4: Write the setup manifest:

mkdir -p ~/.pilot/setups
cat > ~/.pilot/setups/fleet-health-monitor.json << 'MANIFEST'
{
  "setup": "fleet-health-monitor",
  "setup_name": "Fleet Health Monitor",
  "role": "<ROLE_ID>",
  "role_name": "<ROLE_NAME>",
  "hostname": "<prefix>-<role>",
  "description": "<ROLE_DESCRIPTION>",
  "skills": { "<skill>": "<contextual description>" },
  "peers": [ { "role": "...", "hostname": "...", "description": "..." } ],
  "data_flows": [ { "direction": "send|receive", "peer": "...", "port": 1002, "topic": "...", "description": "..." } ],
  "handshakes_needed": [ "<peer-hostname>" ]
}
MANIFEST

Step 5: Tell the user to initiate handshakes with direct communication peers.

Manifest Templates Per Role

web-monitor

{
  "setup": "fleet-health-monitor",
  "setup_name": "Fleet Health Monitor",
  "role": "web-monitor",
  "role_name": "Web Server Monitor",
  "hostname": "<prefix>-web-monitor",
  "description": "Watches nginx/app health, CPU, memory, and response times. Emits alert events when thresholds are breached.",
  "skills": {
    "pilot-health": "Check nginx, app endpoints, SSL certs. Run on schedule or on-demand.",
    "pilot-alert": "When health checks fail, publish alert to <prefix>-alert-hub on topic health-alert.",
    "pilot-metrics": "Collect CPU, memory, disk, and response time. Format as JSON event payloads."
  },
  "peers": [
    { "role": "db-monitor", "hostname": "<prefix>-db-monitor", "description": "Fellow monitor — does not communicate directly" },
    { "role": "alert-hub", "hostname": "<prefix>-alert-hub", "description": "Central alert aggregator — receives health-alert events" }
  ],
  "data_flows": [
    { "direction": "send", "peer": "<prefix>-alert-hub", "port": 1002, "topic": "health-alert", "description": "Health check failures and metric anomalies" }
  ],
  "handshakes_needed": ["<prefix>-alert-hub"]
}

db-monitor

{
  "setup": "fleet-health-monitor",
  "setup_name": "Fleet Health Monitor",
  "role": "db-monitor",
  "role_name": "Database Monitor",
  "hostname": "<prefix>-db-monitor",
  "description": "Monitors database connections, query latency, replication lag, and disk usage. Emits alerts on anomalies.",
  "skills": {
    "pilot-health": "Check PostgreSQL/MySQL connections, replication lag, disk usage.",
    "pilot-alert": "When DB health fails, publish alert to <prefix>-alert-hub on topic health-alert.",
    "pilot-metrics": "Collect query latency, connection pool stats, table sizes."
  },
  "peers": [
    { "role": "web-monitor", "hostname": "<prefix>-web-monitor", "description": "Fellow monitor — does not communicate directly" },
    { "role": "alert-hub", "hostname": "<prefix>-alert-hub", "description": "Central alert aggregator — receives health-alert events" }
  ],
  "data_flows": [
    { "direction": "send", "peer": "<prefix>-alert-hub", "port": 1002, "topic": "health-alert", "description": "Database alerts and replication warnings" }
  ],
  "handshakes_needed": ["<prefix>-alert-hub"]
}

alert-hub

{
  "setup": "fleet-health-monitor",
  "setup_name": "Fleet Health Monitor",
  "role": "alert-hub",
  "role_name": "Alert Aggregator",
  "hostname": "<prefix>-alert-hub",
  "description": "Receives alerts from all monitors, filters duplicates and noise, then forwards critical alerts to Slack and PagerDuty via webhooks.",
  "skills": {
    "pilot-webhook-bridge": "Forward critical alerts to Slack and PagerDuty via webhook URLs.",
    "pilot-alert": "Subscribe to health-alert from all monitors. Aggregate and deduplicate.",
    "pilot-event-filter": "Filter noise and low-severity alerts before forwarding.",
    "pilot-slack-bridge": "Post formatted alert summaries to Slack channels."
  },
  "peers": [
    { "role": "web-monitor", "hostname": "<prefix>-web-monitor", "description": "Sends health alerts from web servers" },
    { "role": "db-monitor", "hostname": "<prefix>-db-monitor", "description": "Sends health alerts from databases" }
  ],
  "data_flows": [
    { "direction": "receive", "peer": "<prefix>-web-monitor", "port": 1002, "topic": "health-alert", "description": "Health check failures and metric anomalies" },
    { "direction": "receive", "peer": "<prefix>-db-monitor", "port": 1002, "topic": "health-alert", "description": "Database alerts and replication warnings" },
    { "direction": "send", "peer": "external", "port": 443, "topic": "slack-forward", "description": "Filtered alerts to Slack and PagerDuty" }
  ],
  "handshakes_needed": ["<prefix>-web-monitor", "<prefix>-db-monitor"]
}

Data Flows

  • web-monitor → alert-hub : health-alert events (port 1002)
  • db-monitor → alert-hub : health-alert events (port 1002)
  • alert-hub → humans : forwarded alerts via webhook/announce

Handshakes

# web-monitor and db-monitor handshake with alert-hub:
pilotctl --json handshake <prefix>-alert-hub "setup: fleet-health-monitor"

# alert-hub handshakes with both monitors:
pilotctl --json handshake <prefix>-web-monitor "setup: fleet-health-monitor"
pilotctl --json handshake <prefix>-db-monitor "setup: fleet-health-monitor"

Workflow Example

# On alert-hub — subscribe to health events:
pilotctl --json subscribe <prefix>-web-monitor health-alert
pilotctl --json subscribe <prefix>-db-monitor health-alert

# On web-monitor — publish a health alert:
pilotctl --json publish <prefix>-alert-hub health-alert '{"host":"web-01","status":"critical","cpu":95,"mem":88}'

# On db-monitor — publish a database alert:
pilotctl --json publish <prefix>-alert-hub health-alert '{"host":"db-01","status":"warning","disk_pct":88,"repl_lag_ms":450}'

Dependencies

Requires pilot-protocol skill, pilotctl binary, clawhub binary, and a running daemon.