Monitoring

Set up observability for applications and infrastructure with metrics, logs, traces, and alerts.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 4 · 2.9k · 43 current installs · 45 all-time installs

byIván@ivangdavila

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

The name/description (observability: metrics, logs, traces, alerts) match the included content: guidance for Prometheus, Grafana, Loki, Sentry, Uptime checks, and alerting. The examples and tooling recommended are appropriate for the stated purpose.

Instruction Scope

The SKILL.md and supporting files include operational commands and examples that access high-sensitivity resources: mounting host directories into containers (e.g., /var/log, /var/lib/docker/containers, /:/rootfs:ro), kubectl commands (kubectl logs, kubectl rollout undo), and curl to healthcheck endpoints. Those are expected for monitoring but are also powerful actions that require host/cluster privileges. The instructions assume the operator has rights to run these commands and to expose host filesystems to containers; the skill does not explicitly constrain or warn about these privileges.

✓

Install Mechanism

Instruction-only skill with no install spec and no code files — low risk from installs or downloaded binaries. All actionable content is configuration snippets and CLI examples.

Credentials

The docs reference several environment variables and secrets (GF_SECURITY_ADMIN_PASSWORD/GRAFANA_PASS, Sentry DSN, OTEL_EXPORTER_OTLP_ENDPOINT, healthchecks UUID, PagerDuty service_key, etc.) but the skill metadata declares no required env vars or primary credential. That mismatch means the skill may rely on sensitive credentials not declared in the registry metadata; users should not assume no secrets are needed.

✓

Persistence & Privilege

always is false and the skill is user-invocable; it does not request persistent platform privileges or modification of other skills. Autonomous invocation is allowed by default but that alone is not a new risk here.

What to consider before installing

This is a practical monitoring playbook (Prometheus/Grafana/Loki, Sentry, uptime checks). It is largely coherent with its purpose, but pay attention to these points before installing or following automated steps: - Secrets and env vars: Examples reference Grafana admin password, Sentry DSN, OTLP endpoint, PagerDuty keys, and healthcheck UUIDs. The skill metadata declares no required env vars — explicitly verify which credentials your deployment will need and keep them out of shared/agent-accessible environments. - Host/cluster access: Docker-compose examples mount /var/log, /var/lib/docker/containers and even /:/rootfs:ro, and run node_exporter with host /proc and /sys mounts. Those mounts give containers read access to sensitive host data. Only run these on hosts you control and understand; prefer least-privilege exporters or dedicated monitoring nodes. - Kubernetes commands: Runbooks show kubectl logs/rollout undo, which require kubeconfig and cluster privileges. Do not grant cluster admin to an untrusted agent; test commands in a staging cluster first. - Automation caution: Because this is instruction-only, there is no packaged code to audit. If you plan to have an agent execute these steps autonomously, restrict its permissions, avoid exposing secrets to the agent, and review each command before execution. - Operational safety: Use service accounts with minimal scopes, rotate keys, avoid mounting entire rootfs unless required, and validate alert routing (PagerDuty keys / Slack webhooks) before sending sensitive data. If you want, I can extract a checklist of required secrets/privileges from the documents, or produce a hardened minimal deployment that avoids mounting host root and reduces secrets exposure.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk973vh1nbeysw60ty0j81jbben8118vq

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Complexity Levels

Level	Tools	Setup Time	Best For
Minimal	UptimeRobot, Healthchecks.io	15 min	Side projects, MVPs
Standard	Uptime Kuma, Sentry, basic Grafana	1-2 hours	Small teams, startups
Professional	Prometheus, Grafana, Loki, Alertmanager	1-2 days	Production systems
Enterprise	Datadog, New Relic, or full OSS stack	Ongoing	Large-scale operations

The Three Pillars

Pillar	What It Answers	Tools
Metrics	"How is the system performing?"	Prometheus, Grafana, Datadog
Logs	"What happened?"	Loki, ELK, CloudWatch
Traces	"Why is this request slow?"	Jaeger, Tempo, Sentry

Quick Start by Use Case

"I just want to know if it's down" → UptimeRobot (free) or Uptime Kuma (self-hosted). See simple.md.

"I need to debug production errors" → Sentry with your framework SDK. 5-minute setup. See apm.md.

"I want real observability" → Prometheus + Grafana + Loki. See prometheus.md.

"I need to centralize logs" → Loki for simple, ELK for complex queries. See logs.md.

What to Monitor

Applications (RED Method)

Rate — requests per second
Errors — error rate by endpoint
Duration — latency (p50, p95, p99)

Infrastructure (USE Method)

Utilization — CPU, memory, disk usage
Saturation — queue depth, load average
Errors — hardware/system errors

Alerting Principles

Do	Don't
Alert on symptoms (user impact)	Alert on causes (CPU high)
Include runbook link	Require investigation to understand
Set appropriate severity	Make everything P1
Require action	Alert on "interesting" metrics

Alert fatigue kills monitoring. If alerts are ignored, you have no monitoring.

For alert configuration, severities, and on-call setup, see alerting.md.

Cost Comparison

Solution	Monthly Cost (small)	Monthly Cost (medium)
UptimeRobot	Free	$7
Uptime Kuma	$5 (VPS)	$5 (VPS)
Sentry	Free / $26	$80
Grafana Cloud	Free tier	$50+
Datadog	$15/host	$23/host + features
Self-hosted stack	$10-20 (VPS)	$50-100 (VPS)

Common Mistakes

Starting with Prometheus/Grafana when Uptime Kuma would suffice
No alerting (dashboards nobody watches)
Too many alerts (alert fatigue → ignored)
Missing runbooks (alert fires, nobody knows what to do)
Not monitoring from outside (only internal checks)
Storing logs forever (cost explodes)

Files

6 total

Select a file

Select a file to preview.

Comments

Loading comments…