Skylv Error Monitoring Agent

Catch errors before users report them. Real-time monitoring, auto-grouping, and smart alerts that reduce noise by 90%.

Audits

Pass

Install

openclaw skills install skylv-error-monitoring-agent

error-monitoring-agent

Catch errors before users report them. Group similar issues, alert on spikes, and auto-resolve known problems — all with zero configuration.

What It Does

  • Real-time detection — Monitor logs, APIs, workers for errors
  • Smart grouping — Merge similar stack traces, reduce noise 90%
  • Rate alerts — Alert when error rate spikes or new error types appear
  • Root cause — Correlate errors with deploys, config changes
  • Auto-resolve — Apply known fixes automatically (restart, retry, rollback)

Quick Start

# 1. Start monitoring
node monitor.js watch --source logs,api

# 2. Check current errors
node monitor.js status

# 3. Set up alert
node monitor.js alert --rule "error_rate > 10/min" --channel slack

# 4. View top errors
node monitor.js aggregate --top 10

Common Use Cases

🚨 Alert on Error Spikes

# Alert when error rate exceeds threshold
node monitor.js alert --rule "error_rate > 10/min" --channel slack

# Alert on new error types
node monitor.js alert --rule "new_error_type" --channel pagerduty

# Alert on spike vs baseline
node monitor.js alert --rule "error_spike > 3x_baseline" --channel email

🔍 Investigate Incident

# Find all errors in time window
node monitor.js aggregate --time-window 1h --top 20

# Analyze specific error
node monitor.js analyze --error-id err_abc123 --depth 5

# Correlate with recent changes
node monitor.js analyze --correlate deploy-log,config-change

🤖 Auto-Resolve Known Issues

# Enable auto-resolution
node monitor.js auto-resolve --strategy restart,retry,rollback

# Apply approved fixes only
node monitor.js auto-resolve --known-fixes db --apply-approved

📊 Track Error Budget

# Check error rate vs SLO
node monitor.js budget --slo 99.9% --window 30d

# View error budget remaining
node monitor.js budget --remaining

All Commands

CommandPurpose
watch --source <src>Start monitoring
statusCurrent error summary
aggregate --top <n>Group similar errors
alert --rule <rule>Create alert rule
analyze --error-id <id>Root cause analysis
auto-resolve --strategy <s>Enable auto-fix
budget --slo <target>Check error budget

Configuration

{
  "monitoring": {
    "sources": ["application", "infrastructure", "api"],
    "sampling": 1.0,
    "retention": "30d",
    "alertRules": [
      { "condition": "error_rate > 10/min", "action": "page-oncall" },
      { "condition": "new_error_type", "action": "notify-channel" }
    ],
    "autoResolve": {
      "enabled": true,
      "approvedStrategies": ["restart-service", "retry-request"]
    }
  }
}