Install
openclaw skills install compound-eng-verification-before-completionEnforces fresh verification evidence before any completion claim. Use when about to claim "tests pass", "bug fixed", "done", "ready to merge", or handing off work.
openclaw skills install compound-eng-verification-before-completionNo completion claims without fresh verification evidence. If the verification command has not been run immediately before the claim, the claim cannot be made. Violating the letter of this rule is violating the spirit -- rephrasing a claim to technically avoid the wording does not exempt it from verification.
"Should pass", "probably works", and "looks correct" are not verification. Only command output confirming the claim counts (typically exit code 0). If pre-existing failures cause non-zero exits unrelated to your changes, see "When Verification Fails" below.
Before running verification, check the working tree state: git status --porcelain. If there are uncommitted changes unrelated to the current task, handle them first (commit, stash, or acknowledge). Adding verification commits on top of a dirty tree creates tangled history.
When verifying work delegated to a subagent, do not trust the implementer's own report. Read the actual code or test output independently. Spec compliance and quality are separate concerns -- verify both.
When a request uses ambiguous spatial scope -- "migrate my project", "refactor the codebase", "update everywhere", "fix this across the app", "my code/repo/project" -- confirm the concrete scope before any Write or Edit. Imperative phrasing is not the same as defined scope.
Run a breakdown command to surface the real blast radius, then present it for confirmation:
# How many files actually match?
rg -l 'pattern' | cut -d/ -f1 | sort | uniq -c | sort -rn
# Which directories are affected?
rg -l 'pattern' | xargs dirname | sort -u
Present the result and ask: "This touches N files across M subsystems. Scope options: (a) everything, (b) just <subset>, (c) let me pick specific files." Do not start editing until the user commits to one.
This is cheap insurance: the alternative is shipping a sprawling diff that the user then has to unwind. "Migrate my project" is the #1 request shape that produces accidental cross-cutting damage.
When this applies: any request whose scope could plausibly span more than one directory AND where the user has not enumerated files. For a request with explicit file paths, skip this gate.
Before any success claim, run through these five steps:
| Step | Action | Example |
|---|---|---|
| 1. Identify | What command proves this claim? Run in order: build -> typecheck -> lint -> test -> security scan -> diff review. Stop on first failure -- later steps are meaningless if earlier ones fail. | pytest tests/, npm test, curl -s localhost:3000/health |
| 2. Run | Execute the full command, fresh (run in this message, with output shown -- cannot reuse prior results) | Not "I ran it earlier" -- run it now |
| 3. Read | Read the complete output, check exit code | Don't scan for "passed" -- read failure counts, warnings, errors |
| 4. Verify | Does the output actually confirm the claim? | "42 passed, 0 failed" confirms "tests pass". "41 passed, 1 failed" does not. |
| 5. Claim | Only now make the statement | "All 42 tests pass" with the evidence visible |
Type-check and unit tests are the universal baseline — they are not sufficient proof on their own. Match the strategy to the change:
| Change type | Required verification |
|---|---|
| Frontend (component, page, form) | Start dev server, exercise the feature in a browser, check console for errors, test the happy path AND one failure path |
| Backend handler / endpoint | curl the endpoint, verify response shape and status code, hit at least one error path (invalid input, missing auth) |
| CLI tool | Run the binary with real inputs, check stdout, stderr, and exit code. Run from /tmp (not the source folder) to catch "only works from source" bugs |
| Infra / IaC (Terraform, Dockerfile, k8s) | terraform plan / docker build / kubectl apply --dry-run=server; review the diff before applying |
| Database migration | Run migration up, run migration down, run migration up again against a copy of production-shape data. State what was tested |
| Refactoring (no behavior change) | Full test suite passes unchanged. Public API surface diff shows no breakage (grep exported identifiers) |
| Library / package update | Run the consumer's test suite against the new version. Check for deprecation warnings |
| Schema change | Old consumers still parse the new shape (forward compat); new consumers handle old shape if the old data still exists (backward compat) |
Reading code is not a strategy. If the table above doesn't have a row for your change, state explicitly: "No runtime verification available — verified by reading the diff."
For any change that touches production logic, include at least one adversarial probe in the verification — not a happy-path confirmation dressed up as verification. Pick the most relevant from:
null, undefined, MAX_INT, 1-char unicode combining markDocs changes, trivial typo fixes, and pure rename refactors are exempt. Everything else: one probe minimum. A report with zero adversarial probes is a happy-path confirmation, not verification.
Before shipping, check whether prior reviews (agent or human) are still valid. If commits landed after the last review, verify the new changes don't invalidate review conclusions -- check that previously flagged issues are still fixed and no new code contradicts the review's approval. git log --oneline <review-commit>..HEAD shows what changed since the review.
Fantasy assessment auto-fail. A claim of "zero issues found" on a first implementation pass is a red flag, not a green light. First implementations typically need 2-3 revision cycles. "Perfect on the first try" more likely means incomplete verification than flawless code. Re-verify with a broader scope.
Negative confirmation at signoff. When reporting verification results, include a brief statement of what defect classes were checked and NOT found, not just what passed. "Tests pass, no type errors, no lint warnings, no security flags in the changed files" is stronger than "tests pass" because it proves the scope of verification.
When a subagent reports success:
Never forward an agent's "all tests pass" without running the tests yourself.
"Tests pass" and "requirements met" are different claims:
Passing tests prove the code works. They don't prove the right code was written.
| Claim | Required Proof |
|---|---|
| "Tests pass" | Test runner output showing 0 failures, exit code 0 |
| "Build succeeds" | Build command output with exit code 0 |
| "Bug is fixed" | Original reproduction case now passes |
| "Feature complete" | All acceptance criteria verified individually |
| "No regressions" | Full test suite passes, not just new tests |
| "Regression test works" | Red-green cycle: test passes, revert fix, test fails, restore fix, test passes |
| "Linting clean" | Linter output showing 0 errors/warnings |
Some changes have no obvious test command -- documentation, configuration, infrastructure-as-code, skill files. In these cases:
jq . for JSON, yamllint for YAML, terraform validate, docker build). If no validator exists, read the file and confirm it matches the intended change.git diff) and confirming the diff matches what was intended. State explicitly: "No automated verification available. Verified by reading the diff."The principle holds: state what you checked and how, even when a test suite doesn't apply.
If the output does not confirm the claim:
If you're reasoning about the outcome instead of running the command, the Gate is not satisfied. "Should work", "trivial change", "just a refactor", "new tests pass" (not "all tests pass"), "CI will catch it" -- these are all the same failure mode: substituting confidence for evidence. Any satisfaction expression ("looks good", "seems correct") triggers the Gate, spirit over letter.
After verification passes, produce a structured report rather than an open-ended summary. This surfaces scope discipline explicitly and makes the agent's restraint visible to reviewers.
## Completion report
**Status**: DONE / DONE_WITH_CONCERNS / BLOCKED / NEEDS_CONTEXT
**Changes made**
- path/to/file.ts: [one-line description of what changed and why]
- path/to/other.ts: [one-line description]
**Things I didn't touch (intentionally)**
- [thing noticed but out of scope, with one-line reason]
- [adjacent issue deferred, with one-line reason]
**Potential concerns**
- [any risk, uncertainty, or open question the reviewer should know about]
- [or "none"]
**Verification evidence**
- [command]: [exit code / result summary]
Status meanings:
Potential concerns section is mandatoryThe Things I didn't touch section is not optional — if nothing was noticed, write "nothing noticed." The goal is to prove the agent considered scope, not to pad the report.
This skill is referenced by:
/ia-work -- before marking tasks complete, before shipping, and before merge/PR creation (Phase 4)ia-receiving-code-review -- verify each fix before marking resolvedia-debugging -- before claiming a bug is fixedia-writing-tests -- tests as primary verification evidenceia-design-iterator agent -- verify design changes render correctlyia-figma-design-sync agent -- verify implementation matches Figma/ia-verify command -- runs the full pre-PR verification pipeline