{"skill":{"slug":"pager-triage","displayName":"pager-triage","summary":"AI-powered incident triage for PagerDuty. List active incidents, deep-dive with timeline and alert correlation, check on-call schedules, acknowledge, resolve...","description":"---\nname: pager-triage\nversion: 0.1.1\ndisplayName: PagerDuty Incident Triage\ndescription: >\n  AI-powered incident triage for PagerDuty. List active incidents, deep-dive\n  with timeline and alert correlation, check on-call schedules, acknowledge,\n  resolve, and annotate — all from your agent. Read-only by default;\n  write operations require explicit --confirm.\nauthor: Anvil AI\ntags:\n  - pagerduty\n  - opsgenie\n  - incident\n  - sre\n  - oncall\n  - triage\n  - discord\n  - discord-v2\ntools:\n  - name: pd_incidents\n    description: List active PagerDuty incidents (triggered + acknowledged)\n  - name: pd_incident_detail\n    description: Get detailed incident info including timeline, alerts, and notes\n  - name: pd_oncall\n    description: Show current on-call schedules and escalation policies\n  - name: pd_incident_ack\n    description: Acknowledge an incident (requires --confirm)\n  - name: pd_incident_resolve\n    description: Resolve an incident (requires --confirm)\n  - name: pd_incident_note\n    description: Add a note to an incident (requires --confirm)\n  - name: pd_services\n    description: List PagerDuty services and their current status\n  - name: pd_recent\n    description: Show recent incidents for a service (last 24h/7d/30d)\n---\n\n# PagerDuty Incident Triage\n\nAI-powered incident triage for PagerDuty. Read-only by default. Write operations require explicit confirmation.\n\n## When to Activate\n\nUse this skill when the user:\n\n- Asks **\"what's firing?\"**, **\"any incidents?\"**, **\"what's on fire?\"**, or anything about active alerts\n- Mentions **PagerDuty**, **pager**, **on-call**, or **incidents**\n- Asks **\"who's on call?\"** or **\"who's oncall right now?\"**\n- Wants to **acknowledge**, **resolve**, or **add a note** to an incident\n- Says **\"triage\"**, **\"incident response\"**, or **\"what happened last night?\"**\n- Asks about **service health** or **recent incidents** for a service\n\n## Quick Setup\n\n### 1. Create a PagerDuty API Key\n\n1. Go to **PagerDuty → Settings → API Access Keys**\n2. Click **\"Create New API Key\"**\n3. Name it `OpenClaw Agent`\n4. Select **Read-only** (recommended for triage; choose Full Access only if you need ack/resolve)\n5. Copy the key\n\n### 2. Set Environment Variables\n\n```bash\n# Required — your PagerDuty REST API v2 token\nexport PAGERDUTY_API_KEY=\"u+your_key_here\"\n\n# Optional — required only for write operations (ack, resolve, note)\nexport PAGERDUTY_EMAIL=\"you@company.com\"\n```\n\n### 3. Verify\n\nAsk your agent: **\"What's firing on PagerDuty?\"**\n\n## Subcommands Reference\n\n### Read Operations (always safe, no confirmation needed)\n\n---\n\n#### `incidents` — List Active Incidents\n\nLists all triggered and acknowledged incidents, sorted by urgency.\n\n```\npager-triage incidents\n```\n\n**Example output:**\n```json\n{\n  \"tool\": \"pd_incidents\",\n  \"provider\": \"pagerduty\",\n  \"timestamp\": \"2026-02-16T03:45:00Z\",\n  \"total_incidents\": 3,\n  \"incidents\": [\n    {\n      \"id\": \"P123ABC\",\n      \"incident_number\": 4521,\n      \"title\": \"High CPU on prod-web-03\",\n      \"status\": \"triggered\",\n      \"urgency\": \"high\",\n      \"service\": { \"id\": \"PSVC123\", \"name\": \"Production Web\" },\n      \"created_at\": \"2026-02-16T03:00:00Z\",\n      \"assignments\": [{ \"name\": \"Jane Doe\", \"email\": \"...\" }],\n      \"alert_count\": 3,\n      \"escalation_level\": 1,\n      \"last_status_change\": \"2026-02-16T03:05:00Z\"\n    }\n  ],\n  \"summary\": \"3 active incident(s): high (triggered) x1, high (acknowledged) x1, low (triggered) x1\"\n}\n```\n\n---\n\n#### `detail <incident_id>` — Incident Deep Dive\n\nFull incident details including timeline (log entries), related alerts, notes, and automated analysis.\n\n```\npager-triage detail P123ABC\n```\n\n**Example output (abbreviated):**\n```json\n{\n  \"tool\": \"pd_incident_detail\",\n  \"incident\": {\n    \"id\": \"P123ABC\",\n    \"title\": \"High CPU on prod-web-03\",\n    \"status\": \"triggered\",\n    \"urgency\": \"high\",\n    \"service\": { \"id\": \"PSVC123\", \"name\": \"Production Web\" },\n    \"escalation_policy\": { \"name\": \"Production Escalation\" },\n    \"assignments\": [{ \"name\": \"Jane Doe\", \"escalation_level\": 1 }]\n  },\n  \"timeline\": [\n    { \"type\": \"trigger_log_entry\", \"created_at\": \"...\", \"summary\": \"Incident triggered via Prometheus Alertmanager\" },\n    { \"type\": \"escalate_log_entry\", \"created_at\": \"...\", \"summary\": \"Escalated to Jane Doe (Level 1)\" }\n  ],\n  \"alerts\": [\n    { \"id\": \"A456DEF\", \"severity\": \"critical\", \"summary\": \"CPU > 95% on prod-web-03\", \"source\": \"Prometheus Alertmanager\" }\n  ],\n  \"notes\": [],\n  \"analysis\": {\n    \"alert_count\": 3,\n    \"escalation_count\": 1,\n    \"acknowledged\": false,\n    \"trigger_source\": \"Prometheus Alertmanager\"\n  }\n}\n```\n\n---\n\n#### `oncall` — On-Call Schedules\n\nShows who's currently on call across all schedules and escalation policies.\n\n```\npager-triage oncall\n```\n\n**Example output:**\n```json\n{\n  \"tool\": \"pd_oncall\",\n  \"oncalls\": [\n    {\n      \"user\": { \"name\": \"Jane Doe\", \"email\": \"jane@company.com\" },\n      \"schedule\": { \"name\": \"Primary SRE\", \"id\": \"PSCHED1\" },\n      \"escalation_policy\": { \"name\": \"Production Escalation\" },\n      \"escalation_level\": 1,\n      \"start\": \"2026-02-15T17:00:00Z\",\n      \"end\": \"2026-02-16T17:00:00Z\"\n    }\n  ],\n  \"summary\": \"2 on-call assignment(s). Primary SRE: Jane Doe. Secondary SRE: Bob Smith.\"\n}\n```\n\n---\n\n#### `services` — List Services\n\nLists all PagerDuty services with their current operational status.\n\n```\npager-triage services\n```\n\n**Example output:**\n```json\n{\n  \"tool\": \"pd_services\",\n  \"services\": [\n    {\n      \"id\": \"PSVC123\",\n      \"name\": \"Production Web\",\n      \"status\": \"critical\",\n      \"description\": \"Production web application servers\",\n      \"escalation_policy\": \"Production Escalation\",\n      \"integrations\": [\"Prometheus Alertmanager\", \"CloudWatch\"]\n    }\n  ],\n  \"summary\": \"12 services: 1 critical, 1 warning, 10 active, 0 disabled\"\n}\n```\n\n---\n\n#### `recent` — Recent Incident History\n\nShows recent incidents for a service or across all services with summary statistics.\n\n```\npager-triage recent                          # All services, last 24h\npager-triage recent --service PSVC123        # Specific service\npager-triage recent --since 7d               # Last 7 days\npager-triage recent --service PSVC123 --since 30d\n```\n\n**Flags:**\n| Flag | Default | Description |\n|------|---------|-------------|\n| `--service <id>` | all | Filter to a specific PagerDuty service ID |\n| `--since <window>` | `24h` | Time window: `24h`, `7d`, or `30d` |\n\n**Example output:**\n```json\n{\n  \"tool\": \"pd_recent\",\n  \"period\": \"last 24 hours\",\n  \"service\": \"PSVC123\",\n  \"incidents\": [ ... ],\n  \"stats\": {\n    \"total\": 5,\n    \"by_urgency\": { \"high\": 2, \"low\": 3 },\n    \"by_status\": { \"resolved\": 4, \"triggered\": 1 },\n    \"mean_time_to_resolve_minutes\": 42\n  }\n}\n```\n\n---\n\n### Write Operations (⚠️ require `--confirm`)\n\nThese operations modify state in PagerDuty. They **require** the `--confirm` flag AND `PAGERDUTY_EMAIL` to be set. Without `--confirm`, the tool displays what it *would* do and exits.\n\n---\n\n#### `ack <incident_id> --confirm` — Acknowledge Incident\n\nAcknowledges a triggered incident, stopping further escalation.\n\n```\npager-triage ack P123ABC --confirm\n```\n\nWithout `--confirm`, displays:\n```json\n{\n  \"error\": \"confirmation_required\",\n  \"message\": \"⚠️ ACKNOWLEDGE INCIDENT — --confirm flag is required to proceed.\",\n  \"incident\": { \"id\": \"P123ABC\", \"title\": \"High CPU on prod-web-03\", \"urgency\": \"high\" },\n  \"hint\": \"Re-run with --confirm to acknowledge this incident.\"\n}\n```\n\nWith `--confirm`:\n```json\n{\n  \"tool\": \"pd_incident_ack\",\n  \"incident_id\": \"P123ABC\",\n  \"status\": \"acknowledged\",\n  \"acknowledged_at\": \"2026-02-16T03:46:00Z\",\n  \"acknowledged_by\": \"jane@company.com\"\n}\n```\n\n---\n\n#### `resolve <incident_id> --confirm` — Resolve Incident\n\nResolves an incident, marking it as fixed.\n\n```\npager-triage resolve P123ABC --confirm\n```\n\nSame confirmation pattern as `ack`. Without `--confirm`, shows incident details and exits. With `--confirm`, resolves and returns confirmation JSON.\n\n---\n\n#### `note <incident_id> --content \"text\" --confirm` — Add Incident Note\n\nAdds a permanent note to an incident's timeline.\n\n```\npager-triage note P123ABC --content \"Root cause: memory leak in auth-service v2.14.3. Rolling back.\" --confirm\n```\n\n**Flags:**\n| Flag | Required | Description |\n|------|----------|-------------|\n| `--content <text>` | Yes | The note text to add |\n| `--confirm` | Yes | Confirmation gate |\n\n**Example output:**\n```json\n{\n  \"tool\": \"pd_incident_note\",\n  \"incident_id\": \"P123ABC\",\n  \"note_id\": \"PNOTE456\",\n  \"content\": \"Root cause: memory leak in auth-service v2.14.3. Rolling back.\",\n  \"created_at\": \"2026-02-16T04:00:00Z\",\n  \"user\": \"Jane Doe\"\n}\n```\n\n---\n\n## Incident Triage Workflow\n\nWhen the user needs help triaging an incident, follow this workflow:\n\n### Step 1: Assess the Situation\n```\n→ pager-triage incidents\n```\nList all active incidents. Prioritize by urgency (high first) and duration (oldest first).\n\n### Step 2: Deep-Dive the Critical One\n```\n→ pager-triage detail <incident_id>\n```\nGet full timeline, alerts, and notes. Identify the trigger source and escalation history.\n\n### Step 3: Correlate with Other Skills\nIf available, use companion skills to investigate root cause:\n- **prom-query** → Query Prometheus for the underlying metrics (CPU, memory, latency, error rate)\n- **kube-medic** → Check pod health, restarts, OOMKills, node status in Kubernetes\n- **log-dive** → Search application logs for errors around the incident timeframe\n\n### Step 4: Act\n```\n→ pager-triage ack <incident_id> --confirm        # Stop escalation while investigating\n→ pager-triage note <incident_id> --content \"...\" --confirm   # Document findings\n→ pager-triage resolve <incident_id> --confirm     # Mark as fixed\n```\n\n### Agent Guidance\n- When the user says \"what's wrong?\" → start with `incidents`\n- When they mention a specific incident → use `detail`\n- When triaging → show incidents first, then detail on the most urgent\n- When correlating → suggest prom-query / kube-medic if installed\n- **ALWAYS** show the confirmation preview before executing write operations\n- **NEVER** ack/resolve without the user explicitly asking to do so\n\n### Discord v2 Delivery Mode (OpenClaw v2026.2.14+)\n\nWhen the conversation is happening in a Discord channel:\n\n- Send a compact first response (active incident count, highest urgency incident, recommended next step), then ask if the user wants full detail.\n- Keep the first response under ~1200 characters and avoid full timeline dumps in the first message.\n- If Discord components are available, include quick actions:\n  - `Deep Dive Incident`\n  - `Acknowledge Incident`\n  - `Add Incident Note`\n- If components are not available, provide the same follow-ups as a numbered list.\n- Prefer short follow-up chunks (<=15 lines per message) for incident timelines and alert lists.\n\n## Security Notes\n\n- **API keys** are read from environment variables only — never logged, displayed, or included in output\n- **Read-only by default** — 5 read commands work with any API key; 3 write commands require `--confirm`\n- **Confirmation gates** — Write operations show full incident context and refuse to proceed without `--confirm`\n- We recommend starting with a **read-only PagerDuty API key** for triage workflows\n- See [SECURITY.md](./SECURITY.md) for the full threat model and RBAC recommendations\n\n## OpsGenie Support (Planned)\n\nOpsGenie integration is planned for a future release. When `OPSGENIE_API_KEY` is set, the same subcommands will map to OpsGenie's REST API with normalized output schemas.\n\n---\n\n<sub>Powered by [Anvil AI](https://anvil-ai.io) · Built for the engineer who gets paged at 3am</sub>\n","tags":{"latest":"0.1.2"},"stats":{"comments":0,"downloads":326,"installsAllTime":2,"installsCurrent":2,"stars":0,"versions":2},"createdAt":1771471516108,"updatedAt":1778491578254},"latestVersion":{"version":"0.1.2","createdAt":1771717089279,"changelog":"Rebrand to Anvil AI. Remove CacheForge marketing copy. Normalize install commands.","license":null},"metadata":null,"owner":{"handle":"tkuehnl","userId":"s1796xcga59f855vmh35x909yh84742a","displayName":"Todd Kuehnl","image":"https://avatars.githubusercontent.com/u/25701510?v=4"},"moderation":null}