Monitoring

Set up observability for applications and infrastructure with metrics, logs, traces, and alerts.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
4 · 2.9k · 43 current installs · 45 all-time installs
byIván@ivangdavila
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name/description (observability: metrics, logs, traces, alerts) match the included content: guidance for Prometheus, Grafana, Loki, Sentry, Uptime checks, and alerting. The examples and tooling recommended are appropriate for the stated purpose.
!
Instruction Scope
The SKILL.md and supporting files include operational commands and examples that access high-sensitivity resources: mounting host directories into containers (e.g., /var/log, /var/lib/docker/containers, /:/rootfs:ro), kubectl commands (kubectl logs, kubectl rollout undo), and curl to healthcheck endpoints. Those are expected for monitoring but are also powerful actions that require host/cluster privileges. The instructions assume the operator has rights to run these commands and to expose host filesystems to containers; the skill does not explicitly constrain or warn about these privileges.
Install Mechanism
Instruction-only skill with no install spec and no code files — low risk from installs or downloaded binaries. All actionable content is configuration snippets and CLI examples.
!
Credentials
The docs reference several environment variables and secrets (GF_SECURITY_ADMIN_PASSWORD/GRAFANA_PASS, Sentry DSN, OTEL_EXPORTER_OTLP_ENDPOINT, healthchecks UUID, PagerDuty service_key, etc.) but the skill metadata declares no required env vars or primary credential. That mismatch means the skill may rely on sensitive credentials not declared in the registry metadata; users should not assume no secrets are needed.
Persistence & Privilege
always is false and the skill is user-invocable; it does not request persistent platform privileges or modification of other skills. Autonomous invocation is allowed by default but that alone is not a new risk here.
What to consider before installing
This is a practical monitoring playbook (Prometheus/Grafana/Loki, Sentry, uptime checks). It is largely coherent with its purpose, but pay attention to these points before installing or following automated steps: - Secrets and env vars: Examples reference Grafana admin password, Sentry DSN, OTLP endpoint, PagerDuty keys, and healthcheck UUIDs. The skill metadata declares no required env vars — explicitly verify which credentials your deployment will need and keep them out of shared/agent-accessible environments. - Host/cluster access: Docker-compose examples mount /var/log, /var/lib/docker/containers and even /:/rootfs:ro, and run node_exporter with host /proc and /sys mounts. Those mounts give containers read access to sensitive host data. Only run these on hosts you control and understand; prefer least-privilege exporters or dedicated monitoring nodes. - Kubernetes commands: Runbooks show kubectl logs/rollout undo, which require kubeconfig and cluster privileges. Do not grant cluster admin to an untrusted agent; test commands in a staging cluster first. - Automation caution: Because this is instruction-only, there is no packaged code to audit. If you plan to have an agent execute these steps autonomously, restrict its permissions, avoid exposing secrets to the agent, and review each command before execution. - Operational safety: Use service accounts with minimal scopes, rotate keys, avoid mounting entire rootfs unless required, and validate alert routing (PagerDuty keys / Slack webhooks) before sending sensitive data. If you want, I can extract a checklist of required secrets/privileges from the documents, or produce a hardened minimal deployment that avoids mounting host root and reduces secrets exposure.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk973vh1nbeysw60ty0j81jbben8118vq

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Complexity Levels

LevelToolsSetup TimeBest For
MinimalUptimeRobot, Healthchecks.io15 minSide projects, MVPs
StandardUptime Kuma, Sentry, basic Grafana1-2 hoursSmall teams, startups
ProfessionalPrometheus, Grafana, Loki, Alertmanager1-2 daysProduction systems
EnterpriseDatadog, New Relic, or full OSS stackOngoingLarge-scale operations

The Three Pillars

PillarWhat It AnswersTools
Metrics"How is the system performing?"Prometheus, Grafana, Datadog
Logs"What happened?"Loki, ELK, CloudWatch
Traces"Why is this request slow?"Jaeger, Tempo, Sentry

Quick Start by Use Case

"I just want to know if it's down" → UptimeRobot (free) or Uptime Kuma (self-hosted). See simple.md.

"I need to debug production errors" → Sentry with your framework SDK. 5-minute setup. See apm.md.

"I want real observability" → Prometheus + Grafana + Loki. See prometheus.md.

"I need to centralize logs" → Loki for simple, ELK for complex queries. See logs.md.

What to Monitor

Applications (RED Method)

  • Rate — requests per second
  • Errors — error rate by endpoint
  • Duration — latency (p50, p95, p99)

Infrastructure (USE Method)

  • Utilization — CPU, memory, disk usage
  • Saturation — queue depth, load average
  • Errors — hardware/system errors

Alerting Principles

DoDon't
Alert on symptoms (user impact)Alert on causes (CPU high)
Include runbook linkRequire investigation to understand
Set appropriate severityMake everything P1
Require actionAlert on "interesting" metrics

Alert fatigue kills monitoring. If alerts are ignored, you have no monitoring.

For alert configuration, severities, and on-call setup, see alerting.md.

Cost Comparison

SolutionMonthly Cost (small)Monthly Cost (medium)
UptimeRobotFree$7
Uptime Kuma$5 (VPS)$5 (VPS)
SentryFree / $26$80
Grafana CloudFree tier$50+
Datadog$15/host$23/host + features
Self-hosted stack$10-20 (VPS)$50-100 (VPS)

Common Mistakes

  • Starting with Prometheus/Grafana when Uptime Kuma would suffice
  • No alerting (dashboards nobody watches)
  • Too many alerts (alert fatigue → ignored)
  • Missing runbooks (alert fires, nobody knows what to do)
  • Not monitoring from outside (only internal checks)
  • Storing logs forever (cost explodes)

Files

6 total
Select a file
Select a file to preview.

Comments

Loading comments…