Sprint Contract

v1.0.0

Multi-agent development workflow with Sprint Contracts and independent QA evaluation. Use when building features, fixing complex bugs, or any task that invol...

0· 107·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (Sprint Contract, multi-agent QA workflow) matches the SKILL.md instructions: it documents a Planner → Generator → Evaluator pattern, file-based briefs/handoffs, and evaluator rules. No unrelated resources (cloud creds, unusual binaries) are requested.
Instruction Scope
Instructions are scoped to creating/consuming BRIEF.md and HANDOFF.md, spawning builder/evaluator sub-agents, and validating explicit completion criteria. This is appropriate for a coordination workflow. Note: the skill explicitly instructs sub-agents to write files and to prefer output-first over research; if installed into an agent with filesystem or repo access, those sub-agents could modify project files — ensure you only enable it where code-writing by agents is acceptable.
Install Mechanism
No install spec or code files — instruction-only. Nothing is downloaded or written to disk by the skill itself, which minimizes supply-chain risk.
Credentials
The skill declares no required environment variables, credentials, or config paths. The runtime instructions do not request secrets or external tokens. This matches the workflow purpose.
Persistence & Privilege
always:false and default autonomous invocation are set (normal). The skill does not request permanent presence or system modifications. Note: autonomous agent invocation is allowed by default — pair this with sensible tool/FS/CI permissions when enabling the skill.
Assessment
This skill is instruction-only and coherent with its stated purpose. Before enabling it for agents that can modify repositories or files, decide whether you want automated sub-agents to create/modify code in that environment: restrict the skill to non-sensitive projects or grant it only read/write access to a sandbox. Require human gate review for high-risk or production changes, ensure BRIEF.md does not include secrets or credentials, and monitor evaluator reports for false positives/overly-strict checks. If you plan to integrate it with CI/CD or grant tool access, test it in an isolated workspace first.

Like a lobster shell, security has layers — review code before you run it.

evaluationvk97ek8hwkkyt4x9t34tgmjk16583jnmblatestvk97ek8hwkkyt4x9t34tgmjk16583jnmbmulti-agentvk97ek8hwkkyt4x9t34tgmjk16583jnmbqavk97ek8hwkkyt4x9t34tgmjk16583jnmbqualityvk97ek8hwkkyt4x9t34tgmjk16583jnmbworkflowvk97ek8hwkkyt4x9t34tgmjk16583jnmb
107downloads
0stars
1versions
Updated 3w ago
v1.0.0
MIT-0

Sprint Contract — Multi-Agent Quality System

Based on Anthropic's harness design for long-running apps: separate the agent doing the work from the agent judging it.

Core Principle

Never let the builder evaluate their own work on complex tasks. LLMs reliably praise their own output — even when it's mediocre. An independent evaluator, tuned to be skeptical, catches what self-evaluation misses.

Architecture

Planner (you/human) → Generator (sub-agent) → Evaluator (independent sub-agent)
     ↑                                              |
     └──────────── feedback loop ←──────────────────┘

Workflow

1. Write a BRIEF.md with Sprint Contract

Every task gets a BRIEF.md. The Sprint Contract section is mandatory — it lists specific, testable completion criteria.

# Task Brief

## Background
[Why this task exists]

## Objective
[What to build/fix]

## Sprint Contract (Completion Criteria)
- [ ] Criterion 1 (specific, verifiable)
- [ ] Criterion 2
- [ ] ...

⚠️ Write criteria specific to THIS task. No generic checklists.

## Related Files
[File paths relevant to the task]

## Constraints
[Tech stack, prior decisions, known pitfalls]

## Handoff Requirements
Write a HANDOFF.md when done, containing:
- What was done (file change list)
- Design decisions made (and why)
- What's left / known issues
- Everything needed for reporting to the human

2. Spawn Generator (Builder)

The generator receives the BRIEF.md and builds against the Sprint Contract. Key rules for the generator prompt:

  • Work against the Sprint Contract criteria
  • Self-check each criterion before handing off
  • Write HANDOFF.md when done
  • Write files first, read references second (output > research)

3. Spawn Evaluator (Independent QA)

After the generator finishes, spawn a separate agent as evaluator. The evaluator prompt must include:

The Sprint Contract — copied from BRIEF.md, to verify each criterion.

4 Evaluation Dimensions (select what's relevant):

DimensionWhat to check
Functional completenessEvery Sprint Contract criterion passes
User experienceFlow is intuitive, no dead ends
Visual qualityLayout, spacing, colors are professional
Code/content qualityNo errors, clean logic, no regressions

The critical prompt line:

"Your job is to find problems, not to praise. If everything looks fine, you probably didn't test carefully enough. Report issues honestly — better a false alarm than a missed bug."

4. Decision Gate

Based on evaluator feedback:

  • All criteria pass → Ship it
  • Criteria fail → Feed evaluator report back to generator for fixes
  • Architecture issues → Escalate to human

When to Use Each Mode

Task complexityGeneratorEvaluatorExample
Simple (< 30 min)Sub-agentSelf-evaluate, mark "⚠️ untested"Fix a typo, update config
Medium (30 min - 2 hr)Sub-agentIndependent sub-agentNew feature, bug fix
Complex (2+ hr)Claude Code / ACPIndependent sub-agent + human reviewArchitecture change, new project

Sprint Contract Examples

See references/contract-examples.md for project-specific contract templates.

Key Insights from Anthropic's Research

  1. File-based communication — Agents talk through files (BRIEF.md, HANDOFF.md), not conversation
  2. Evaluator calibration — Default LLMs are too lenient; explicitly prompt for skepticism
  3. Sprint scoping — One feature at a time; don't bundle unrelated work
  4. Opus 4.6 + 1M context — Context anxiety is gone; sprint decomposition is less critical, but evaluator still adds value at task boundaries
  5. Evaluation criteria shape output — The wording of your criteria directly steers what the generator produces

Comments

Loading comments...