Incident Triage
v0.2.0Structured incident triage for alerts from any monitoring source. Five-step framework: classify severity, scope blast radius, correlate with recent changes,...
Like a lobster shell, security has layers — review code before you run it.
Incident Triage
Structured incident triage for alerts from any monitoring source. Five steps, consistent every time.
Pass in the raw alert message, a link to the alert, or a description of what's happening.
Triage Process
When an alert appears:
- Classify — what type and severity?
- Scope — blast radius: who's affected, which environment, since when?
- Correlate — what changed recently? Check deploys, merges, config changes
- Investigate — guided checks based on alert type
- Act — summarize, create ticket, escalate or close
Read references/triage-framework.md for the full framework with checklists and bash snippets for each step.
Alert Parsing
Before starting the triage framework, identify the alert source and extract key fields.
Read references/alert-patterns.md for patterns covering PagerDuty, Datadog, CloudWatch, Sentry, uptime monitors, GitHub Actions, AWS SNS/EventBridge, and custom webhooks.
Escalation
When to page, when to watch, when to close. Severity-based response times and communication templates.
Read references/escalation-guide.md for defaults — customize for your team's on-call structure.
Runbook
During Step 4 (Investigate), load references/runbook-template.md to find service health endpoints, dashboards, log locations, and common fixes. Fill it in with your infrastructure before your first real incident.
References
- references/triage-framework.md — full 5-step triage process with checklists
- references/alert-patterns.md — parsing alerts from common sources
- references/escalation-guide.md — severity levels, response times, escalation policy
- references/runbook-template.md — your infrastructure map (fill in before use)
Works Well With
- github — check recent deploys and CI runs during the correlation step
- aws-ecs-monitor — ECS service health during investigation
- structured-pr-review — review the PR that caused the incident
- gh-issues — automated alert monitoring and triage spawning
Comments
Loading comments...
