Install
openclaw skills install watchdog-heartbeatMonitor service health, heartbeat freshness, stuck workflows, and trigger recovery or degraded mode. Use on: high-frequency schedule, after system startup, when a workflow stalls, when heartbeat freshness must be verified. Triggered by watchdog cron jobs or health check requests.
openclaw skills install watchdog-heartbeatProvide observability and recovery awareness for a resident OpenClaw system. Verify process aliveness, heartbeat freshness, and workflow integrity.
Required:
service_list — list of monitored services and their expected health stateshealth_endpoints — map of service → health check endpoint or methodheartbeat_records — recent heartbeat timestamps per agent/skillworkflow_status_records — current status of all active workflowsrestart_records — history of service restarts and recovery eventsservice_health_summary: {
service: string
status: "healthy" | "degraded" | "down" | "unknown"
last_check: string # ISO-8601
latency_ms: number | null
error: string | null
}[]
expired_heartbeat_list: {
agent_or_skill: string
last_heartbeat: string # ISO-8601
seconds_expired: number
severity: "warning" | "critical"
}[]
stuck_workflow_list: {
workflow_id: string
workflow_name: string
stuck_since: string # ISO-8601
stuck_duration_min: number
last_progress: string | null
severity: "warning" | "critical"
}[]
recovery_recommendation: {
action: "restart" | "notify" | "escalate" | "no_action" | "degraded_mode"
target: string
reason: string
}[]
degraded_mode_recommendation: {
affected_services: string[]
degraded_features: string[]
estimated_recovery_time: string | null
user_impact: string
}
watchdog_log: {
check_id: string
check_time: string # ISO-8601
services_checked: number
heartbeats_checked: number
workflows_checked: number
issues_found: number
observability_gap: string[] | null
}
| Seconds Expired | Severity |
|---|---|
| < 60s | healthy |
| 60s – 300s | warning |
| > 300s | critical |
| Duration | Severity |
|---|---|
| < 10 min | healthy (in progress) |
| 10 – 30 min | warning |
| > 30 min | critical |
no_action — within normal parametersnotify — alert human, no automatic restartrestart — attempt automatic restartescalate — human intervention requireddegraded_mode — reduce functionality, maintain partial serviceIf monitoring data is incomplete:
observability_gap with missing field namesstatus = "unknown" for affected servicesescalate if critical services have observability gaps