Install
openclaw skills install toil-trackerIdentify, measure, and reduce operational toil — repetitive manual work that scales linearly with service growth. Categorize toil by type, estimate engineeri...
openclaw skills install toil-trackerFind the manual work that's eating your engineering time. Toil is repetitive, automatable, tactical work that scales with service size and has no lasting value. Identify it, measure it, prioritize what to automate first, and track reduction over time.
Use when: "how much toil do we have", "what should we automate", "toil budget", "manual operational work", "repetitive tasks", "SRE toil reduction", or during quarterly planning to justify automation projects.
survey — Catalog Toil SourcesInterview the team or analyze work tracking systems. Common toil categories:
| Category | Examples | Signal |
|---|---|---|
| Deploys | Manual deploy steps, config changes, rollbacks | "Someone has to click..." |
| Tickets | Password resets, access requests, cert renewals | "Every week we get..." |
| Monitoring | False alerts, manual alert triage, dashboard watching | "We page about this but..." |
| Scaling | Manual capacity adjustments, resource provisioning | "When traffic spikes we..." |
| Data | Manual data fixes, migrations, backfills | "Users file tickets to..." |
| Maintenance | Dependency updates, cert rotations, key rotations | "Every quarter we have to..." |
| Onboarding | Setting up dev environments, granting access | "New hire setup takes..." |
# Analyze ticket systems for repetitive patterns
# Jira/Linear — find recurring ticket types
# Example: count tickets by label/type in last quarter
# Analyze on-call alerts for noise
curl -s "https://api.pagerduty.com/incidents?since=2026-01-01&until=2026-04-01&statuses[]=resolved" \
-H "Authorization: Token token=$PD_TOKEN" | python3 -c "
import json, sys, collections
incidents = json.load(sys.stdin)['incidents']
by_service = collections.Counter(i['service']['summary'] for i in incidents)
print('Incidents by service (potential toil):')
for service, count in by_service.most_common(10):
print(f' {count:>4}x {service}')
"
For each toil source, estimate:
def calculate_toil_budget(toil_items, team_size, hours_per_quarter=520):
"""
Google SRE recommends: max 50% of SRE time on toil.
"""
total_toil_hours = 0
for item in toil_items:
quarterly_hours = item['frequency_per_quarter'] * item['hours_per_occurrence'] * item['people_involved']
total_toil_hours += quarterly_hours
item['quarterly_hours'] = quarterly_hours
team_capacity = team_size * hours_per_quarter
toil_percentage = (total_toil_hours / team_capacity) * 100
return {
'total_toil_hours': total_toil_hours,
'team_capacity_hours': team_capacity,
'toil_percentage': toil_percentage,
'status': '🟢 Healthy' if toil_percentage < 30 else '🟡 Watch' if toil_percentage < 50 else '🔴 Over budget',
'items_ranked': sorted(toil_items, key=lambda x: -x['quarterly_hours']),
}
# Toil Report — Q2 2026
## Summary
- Team size: 6 SREs
- Total toil: 420h/quarter (13.5h/person/week)
- Toil budget: 34% of capacity 🟡 (target: <30%)
## Top Toil Sources (ranked by hours)
| Rank | Category | Task | Freq | Duration | Hours/Q | Automatable? |
|------|----------|------|------|----------|---------|-------------|
| 1 | Tickets | Access requests | 20/week | 15 min | 65h | ✅ Self-serve portal |
| 2 | Deploys | Manual prod deploy | 3/week | 45 min | 58.5h | ✅ CI/CD pipeline |
| 3 | Monitoring | False alert triage | 10/week | 20 min | 43h | ✅ Tune thresholds |
| 4 | Data | Customer data fixes | 5/week | 30 min | 32.5h | ✅ Admin tool |
| 5 | Maintenance | Cert renewals | 12/quarter | 2h | 24h | ✅ auto-renew |
## Automation ROI
| Project | Est. Effort | Toil Saved/Q | Payback |
|---------|------------|-------------|---------|
| Self-serve access portal | 80h | 65h | 1.2 quarters |
| CD pipeline | 120h | 58.5h | 2.1 quarters |
| Alert tuning sprint | 20h | 43h | 0.5 quarters |
| Admin data tool | 60h | 32.5h | 1.8 quarters |
| Auto cert renewal | 8h | 24h | 0.3 quarters |
## Recommendation
Start with alert tuning (fastest ROI) and auto cert renewal (lowest effort). Then tackle self-serve access portal. Defer CD pipeline to Q3 (high effort but high payoff).
prioritize — Rank Automation CandidatesScore each toil source by:
Calculate ROI = hours_saved_per_quarter / automation_hours.
track — Monitor Toil Reduction Over TimeCompare toil hours quarter-over-quarter: