Install
openclaw skills install agent-estimationAccurately estimate AI agent work effort using the agent's own operational units (tool-call rounds) instead of human time. Use when asked to estimate, scope, plan, or evaluate how long a coding task will take. Prevents the common failure mode where agents anchor to human developer timelines and massively overestimate. Outputs a structured breakdown with round counts, risk factors, and a final wallclock conversion.
openclaw skills install agent-estimationAI coding agents systematically overestimate task duration because they anchor to human developer timelines absorbed from training data. A task an agent can complete in 30 minutes gets estimated as "2-3 days" because that's what a human developer forum post would say.
Force the agent to estimate from its own operational units — tool-call rounds — and only convert to human wallclock time at the very end.
| Unit | Definition | Scale |
|---|---|---|
| Round | One tool-call cycle: think → write code → execute → verify → fix | ~2-4 min wallclock |
| Module | A functional unit built from multiple rounds until usable | 2-15 rounds |
| Project | All modules + integration + debugging | Sum of modules × integration factor |
A Round is the atomic unit. It maps directly to one iteration of:
When asked to estimate a task, follow these steps in order:
Break the task into functional modules. Each module should be independently buildable and testable. Ask yourself: "What are the distinct pieces I would build one at a time?"
For each module, estimate the number of rounds using these anchors:
| Pattern | Typical Rounds | Examples |
|---|---|---|
| Boilerplate / known pattern | 1-2 | CRUD endpoint, config file, standard API client |
| Moderate complexity | 3-5 | Custom UI layout, state management, data pipeline |
| Exploratory / under-documented | 5-10 | Unfamiliar framework, platform-specific APIs, complex integrations |
| High uncertainty | 8-15 | Undocumented behavior, novel algorithms, multi-system debugging |
Key calibration rules:
Each module gets a risk coefficient that inflates its round count:
| Risk Level | Coefficient | When to Apply |
|---|---|---|
| Low | 1.0 | Mature ecosystem, clear docs, agent has strong pattern match |
| Medium | 1.3 | Minor unknowns, may need 1-2 extra debug rounds |
| High | 1.5 | Sparse docs, platform quirks, integration unknowns |
| Very High | 2.0 | Possible dead ends, may need to change approach entirely |
Module effective rounds = base rounds × risk coefficient
Project rounds = Σ(module effective rounds) + integration rounds
Integration rounds = 10-20% of base total (for wiring modules together)
Only at the very end, convert to human time:
Wallclock time = project rounds × minutes_per_round
Default minutes_per_round = 3 minutes (includes agent generation time + user review time).
Adjust this parameter based on context:
Always output the estimation in this exact structure:
### Task: [task name]
#### Module Breakdown
| # | Module | Base Rounds | Risk | Effective Rounds | Notes |
|---|--------|------------|------|-----------------|-------|
| 1 | ... | N | 1.x | M | why |
| 2 | ... | N | 1.x | M | why |
#### Summary
- **Base rounds**: X
- **Integration**: +Y rounds
- **Risk-adjusted total**: Z rounds
- **Estimated wallclock**: A – B minutes (at N min/round)
#### Biggest Risks
1. [specific risk and what could blow up the estimate]
2. [...]
These are the failure modes this skill exists to prevent:
minutes_per_round, don't add phantom rounds.Here are example projects with known round counts to help calibrate:
See references/calibration-examples.md for detailed examples across project types.
See evals/evals.json for test cases to validate estimation accuracy.