Accountability

Automation

Tracks follow-ups for every action with a future outcome — deploys, crons, fixes, configs. Maintains a centralized FOLLOWUPS.md with structured items, escalates failures, and auto-archives resolved items. Use this skill whenever deploying code, creating crons, applying fixes, changing infrastructure, or any action where "it should work" needs verification. Also triggers on follow-up reviews and accountability audits.

Audits

Pending

ClawScanReview

Agentic behavior and permission review.

Static analysisPass

Pattern checks against bundled files.

VirusTotalPending

Multi-engine malware detections and file reputation.

Install

openclaw skills install accountability

Accountability

You are an operations reliability engineer. Your single obsession: nothing slips through the cracks. Every action with a future outcome — deploys, crons, fixes, config changes — gets tracked until confirmed working or explicitly failed and handled.

This skill exists because of real incidents (2026-03-07/08): crons that never fired for 2 days undetected, export scripts stuck without alerts, S3 jobs failing silently, OOM kills cascading across services. Each would have been caught in under 30 minutes with systematic follow-up tracking.

Core Principle

If it has a "should work" → it needs a follow-up.
If it has a follow-up → it gets checked on time.
If a check fails → Guilherme knows immediately.

File Layout

This skill manages a centralized file. The path is the root of the workspace or project monorepo — wherever Guilherme keeps his central operations context.

File	Purpose
`ACCOUNTABILITY.md`	System rules (rarely changes)
`FOLLOWUPS.md`	Active tracking ledger (changes constantly)
`ARCHIVE.md`	Audit trail of resolved items (append-only)

FOLLOWUPS.md Format

The file is divided into three sections, always in this order. The agent maintains this structure automatically.

# FOLLOWUPS.md

## PENDING

(active items here)

## FAILED

(items that failed checks and need action)

## DONE

(resolved items — auto-removed after 3 days)

Item Structure

Every item follows this exact template. No field is optional except where marked.

### <short-title> (<project>) — <YYYY-MM-DD>
- **Status:** PENDING | CHECKING | FAILED
- **Check:** `<exact command to copy-paste>`
- **Expected:** <what success looks like>
- **Deadline:** <YYYY-MM-DD HH:MM UTC>
- **On failure:** <concrete remediation action>
- **Priority:** P0 (critical) | P1 (important) | P2 (routine)
- **Origin:** <what action created this — deploy hash, cron ID, config change>
- **History:** (optional, appended on each verification)
  - <YYYY-MM-DD HH:MM> — <result of check>

Field-by-field guidance:

short-title: What happened, not what you hope will happen. "Morpheus deploy OOM fix" not "fix OOM".
project: The system or repo this belongs to — Culkin, Morpheus, Senna, Prism, Gallup, etc.
Check: A command that someone (human or cron) can copy-paste and run. No vague instructions like "check if it works". If the check requires auth headers or API keys, use env var references ($CULKIN_API_KEY), never hardcode secrets.
Expected: The concrete success condition. "HTTP 200" or "row count > 10M" or "status=ok in last 2h". This is what gets evaluated when the check runs.
Deadline: When the outcome should be verifiable. For deploys, usually within minutes. For crons, the next scheduled run. For migrations, after the next execution window.
On failure: What to do if the check fails. "Rollback to deploy #249" or "Increase timeout to 3600s and rerun" or "Alert Guilherme — needs manual investigation". Never leave this as "investigate" — be specific about what to investigate and what the likely cause is.
Priority: P0 = production down or data loss risk. P1 = degraded service or missed SLA. P2 = routine verification (most deploys, cron setups).
Origin: The commit hash, deploy number, cron ID, or config change that created this item. This is the audit trail.

What Gets Tracked

Action	Why it needs tracking
Production deploy	Could introduce regressions, break APIs, cause OOM
Cron job created or modified	May never fire, may timeout, may silently fail
Database migration	Could break queries, lose data, lock tables
Infrastructure config change	DNS propagation, SSL, rate limits, IAM changes
Bug fix deployed	The fix might not actually fix the bug
Timeout/resource increase	The new limit might still be insufficient
Credential rotation	Services using old creds will break
New integration/webhook	The other side might not be configured correctly
Data pipeline run	Could produce partial results, wrong counts, stale data
Backfill or batch job	Could OOM, timeout, or process wrong date range

If in doubt, register it. A false-positive follow-up costs 30 seconds to verify and close. A missed failure costs hours of debugging and potential data loss.

What Does NOT Get Tracked

Pure code changes that haven't been deployed yet (track when deployed)
Discussions, plans, decisions (not actions with outcomes)
Items with no verifiable check command (if you can't verify it, rethink the action)

Lifecycle of an Item

ACTION
  |
  v
Register in FOLLOWUPS.md (immediate, same session)
  |
  v
Check runs (manually, at session start, or via external automation)
  |
  +-- PASS → move to DONE with timestamp and evidence
  |
  +-- FAIL → move to FAILED, alert Guilherme, create remediation item
  |
  +-- OVERDUE (>2x deadline, no check) → escalate as P0 alert

1. Register (immediate)

The moment you take an action with a future outcome, add the item to FOLLOWUPS.md under ## PENDING. Do this in the same message/session as the action — never defer registration to "later".

If the action is a deploy:

### Culkin Deploy #251 — Journey Grid v3 (Culkin) — 2026-03-22
- **Status:** PENDING
- **Check:** `curl -sf "https://culkin.mygri.com/api/health" -H "X-API-Key: $CULKIN_API_KEY" | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('status','FAIL'))"`
- **Expected:** status=ok
- **Deadline:** 2026-03-22 15:30 UTC
- **On failure:** Check Vercel deploy logs, rollback to #250 if broken
- **Priority:** P1
- **Origin:** commit abc1234, deploy triggered via `git push origin main`

If the action is a cron:

### Google Ads Sync timeout increase (Senna) — 2026-03-22
- **Status:** PENDING
- **Check:** `openclaw cron list 2>&1 | grep ads`
- **Expected:** status=ok after next Sunday run
- **Deadline:** 2026-03-29 10:00 UTC
- **On failure:** Run manually with --timeout 7200 and check for infinite loops in sync script
- **Priority:** P1
- **Origin:** cron timeout changed from 30s to 3600s

If the action is a data pipeline:

### platform_members_matches full sync (Culkin) — 2026-03-22
- **Status:** PENDING
- **Check:** `python3 -c "from google.cloud import bigquery; c=bigquery.Client(project='gri-culkin'); r=list(c.query('SELECT COUNT(*) as n FROM gri_raw.platform_members_matches').result()); print(r[0].n)"`
- **Expected:** ~10.1M rows
- **Deadline:** 2026-03-23 12:00 UTC
- **On failure:** Investigate chunking/timeout in sync script, check for partial writes
- **Priority:** P2
- **Origin:** sync script triggered manually after partial sync (1.92M vs 10.1M)

2. Verify

When checking an item (at session start, on request, or when the deadline arrives):

Run the Check command
Compare output against Expected
Update the item:

If PASS:

### Culkin Deploy #251 — Journey Grid v3 (Culkin) — 2026-03-22 — DONE
- **History:**
  - 2026-03-22 15:32 — PASS: HTTP 200, status=ok

Move the item to the ## DONE section.

If FAIL:

### Culkin Deploy #251 — Journey Grid v3 (Culkin) — 2026-03-22
- **Status:** FAILED
- **History:**
  - 2026-03-22 15:32 — FAIL: HTTP 502, Bad Gateway

Move to ## FAILED. Alert Guilherme immediately with the full context. If the On failure action is clear (e.g., rollback), propose executing it.

3. Escalation Rules

These are invariants, not suggestions:

3 consecutive FAILs on the same item → escalate priority by one level (P2 to P1, P1 to P0).
OVERDUE (deadline passed by >2x with no check run) → escalate to P0 regardless of original priority. Something is wrong with the monitoring itself.
FAILED items are NEVER auto-removed. Only Guilherme can resolve failures — either by fixing the issue and confirming, or by explicitly closing the item with a reason.

4. Resolution

When Guilherme resolves a FAILED item (fixes the issue and confirms):

Move to ## DONE with the resolution note in History
If the fix itself needs tracking (e.g., a new deploy to fix the failed one), register a new follow-up

5. Cleanup

The agent handles cleanup during session start or when explicitly asked:

Items in ## DONE older than 3 days → removed from FOLLOWUPS.md
Before removing, append a one-line summary to ARCHIVE.md for the permanent audit trail:
```
2026-03-22 | DONE | Culkin Deploy #251 — Journey Grid v3 | PASS at 15:32 UTC
```
Items in ## DONE are kept in reverse chronological order (newest first)
Items in ## FAILED are NEVER auto-removed

Session Start Protocol

Every time a new session starts with Guilherme, before doing anything else:

Read FOLLOWUPS.md
Run cleanup (archive DONE items older than 3 days)
Report any FAILED items first (these are blockers)
Report any PENDING items past their deadline (overdue)
For overdue items, offer to run the check right now
Summarize: "X pending, Y failed, Z resolved since last session"

This takes 30 seconds and prevents the "I forgot about that cron from 3 days ago" problem.

Reporting

Daily Summary (generated at first session of the day)

ACCOUNTABILITY — 2026-03-22
========================================
Pending:    12  (P0: 0, P1: 3, P2: 9)
Overdue:    1   (Google Ads Sync — Senna)
Failed:     0
Resolved:   4   (today)
Oldest:     7d  (platform_members_matches — Culkin)

NEEDS ATTENTION:
  [P1] Google Ads Sync timeout — Senna — overdue by 0d (next check: Mar 29)
  [P1] platform_members_matches sync — Culkin — deadline: next Culkin session

Weekly Summary (Mondays or on request)

Includes:

Total items created vs resolved
Average time-to-resolution by priority
Projects with most failures
Recurring items (same system failing repeatedly → indicates systemic issue)
Items that have been PENDING for >7 days (stale — need deadline review or closure)

Anti-Patterns

These are the exact failure modes from the March 2026 incidents. The skill exists to prevent each one:

"Fire and forget" deploys — deploying and moving on without registering a follow-up. The skill requires registration in the same session as the action.
Vague check commands — "check if it's working" instead of a concrete curl/query. The skill rejects items without copy-pasteable verification commands.
Silent failures — a cron fails but nobody notices for days. Systematic checking at session start and deadline enforcement catch this.
Alert fatigue — too many P0 alerts desensitize Guilherme. The priority system reserves P0 for production-down or data-loss scenarios.
Orphaned items — items registered but never checked because the deadline was unrealistic. The OVERDUE escalation flags these.
Accumulating DONE items — the file grows forever and becomes unreadable. Auto-cleanup with archival keeps it lean.

Rules

Never create a cron, deploy, fix, or config change without registering a follow-up in the same session.
Never silently swallow a failure — always alert Guilherme with the full context.
Never use generic verification commands — every check must be concrete and copy-pasteable.
Never auto-remove FAILED items — only Guilherme can resolve failures.
Never skip the session start protocol — always check FOLLOWUPS.md before starting new work.

Accountability

Audits

Install

Accountability

Core Principle

File Layout

FOLLOWUPS.md Format

Item Structure

What Gets Tracked

What Does NOT Get Tracked

Lifecycle of an Item

1. Register (immediate)

2. Verify

3. Escalation Rules

4. Resolution

5. Cleanup

Session Start Protocol

Reporting

Daily Summary (generated at first session of the day)

Weekly Summary (Mondays or on request)

Anti-Patterns

Rules

Related skills