Install
openclaw skills install sprint-contractMulti-agent development workflow with Sprint Contracts and independent QA evaluation. Use when building features, fixing complex bugs, or any task that involves spawning sub-agents to do work. Implements the Planner-Generator-Evaluator pattern (inspired by Anthropic's GAN-style harness design) to ensure quality through explicit completion criteria and independent testing. Triggers on development tasks, feature builds, bug fixes, code reviews, or when spawning coding agents.
openclaw skills install sprint-contractBased on Anthropic's harness design for long-running apps: separate the agent doing the work from the agent judging it.
Never let the builder evaluate their own work on complex tasks. LLMs reliably praise their own output — even when it's mediocre. An independent evaluator, tuned to be skeptical, catches what self-evaluation misses.
Planner (you/human) → Generator (sub-agent) → Evaluator (independent sub-agent)
↑ |
└──────────── feedback loop ←──────────────────┘
Every task gets a BRIEF.md. The Sprint Contract section is mandatory — it lists specific, testable completion criteria.
# Task Brief
## Background
[Why this task exists]
## Objective
[What to build/fix]
## Sprint Contract (Completion Criteria)
- [ ] Criterion 1 (specific, verifiable)
- [ ] Criterion 2
- [ ] ...
⚠️ Write criteria specific to THIS task. No generic checklists.
## Related Files
[File paths relevant to the task]
## Constraints
[Tech stack, prior decisions, known pitfalls]
## Handoff Requirements
Write a HANDOFF.md when done, containing:
- What was done (file change list)
- Design decisions made (and why)
- What's left / known issues
- Everything needed for reporting to the human
The generator receives the BRIEF.md and builds against the Sprint Contract. Key rules for the generator prompt:
After the generator finishes, spawn a separate agent as evaluator. The evaluator prompt must include:
The Sprint Contract — copied from BRIEF.md, to verify each criterion.
4 Evaluation Dimensions (select what's relevant):
| Dimension | What to check |
|---|---|
| Functional completeness | Every Sprint Contract criterion passes |
| User experience | Flow is intuitive, no dead ends |
| Visual quality | Layout, spacing, colors are professional |
| Code/content quality | No errors, clean logic, no regressions |
The critical prompt line:
"Your job is to find problems, not to praise. If everything looks fine, you probably didn't test carefully enough. Report issues honestly — better a false alarm than a missed bug."
Based on evaluator feedback:
| Task complexity | Generator | Evaluator | Example |
|---|---|---|---|
| Simple (< 30 min) | Sub-agent | Self-evaluate, mark "⚠️ untested" | Fix a typo, update config |
| Medium (30 min - 2 hr) | Sub-agent | Independent sub-agent | New feature, bug fix |
| Complex (2+ hr) | Claude Code / ACP | Independent sub-agent + human review | Architecture change, new project |
See references/contract-examples.md for project-specific contract templates.