ia-verification-before-completion

Dev Tools

Enforces fresh verification evidence before any completion claim. Use when about to claim "tests pass", "bug fixed", "done", "ready to merge", or handing off work.

Install

openclaw skills install compound-eng-verification-before-completion

Verification Before Completion

The Rule

No completion claims without fresh verification evidence. If the verification command has not been run immediately before the claim, the claim cannot be made. Violating the letter of this rule is violating the spirit -- rephrasing a claim to technically avoid the wording does not exempt it from verification.

"Should pass", "probably works", and "looks correct" are not verification. Only command output confirming the claim counts (typically exit code 0). If pre-existing failures cause non-zero exits unrelated to your changes, see "When Verification Fails" below.

Pre-Verification Check

Before running verification, check the working tree state: git status --porcelain. If there are uncommitted changes unrelated to the current task, handle them first (commit, stash, or acknowledge). Adding verification commits on top of a dirty tree creates tangled history.

When verifying work delegated to a subagent, do not trust the implementer's own report. Read the actual code or test output independently. Spec compliance and quality are separate concerns -- verify both.

Scope Confirmation (Pre-Edit Gate)

When a request uses ambiguous spatial scope -- "migrate my project", "refactor the codebase", "update everywhere", "fix this across the app", "my code/repo/project" -- confirm the concrete scope before any Write or Edit. Imperative phrasing is not the same as defined scope.

Run a breakdown command to surface the real blast radius, then present it for confirmation:

# How many files actually match?
rg -l 'pattern' | cut -d/ -f1 | sort | uniq -c | sort -rn

# Which directories are affected?
rg -l 'pattern' | xargs dirname | sort -u

Present the result and ask: "This touches N files across M subsystems. Scope options: (a) everything, (b) just <subset>, (c) let me pick specific files." Do not start editing until the user commits to one.

This is cheap insurance: the alternative is shipping a sprawling diff that the user then has to unwind. "Migrate my project" is the #1 request shape that produces accidental cross-cutting damage.

When this applies: any request whose scope could plausibly span more than one directory AND where the user has not enumerated files. For a request with explicit file paths, skip this gate.

Gate Function

Before any success claim, run through these five steps:

Step	Action	Example
1. Identify	What command proves this claim? Run in order: build -> typecheck -> lint -> test -> security scan -> diff review. Stop on first failure -- later steps are meaningless if earlier ones fail.	`pytest tests/`, `npm test`, `curl -s localhost:3000/health`
2. Run	Execute the full command, fresh (run in this message, with output shown -- cannot reuse prior results)	Not "I ran it earlier" -- run it now
3. Read	Read the complete output, check exit code	Don't scan for "passed" -- read failure counts, warnings, errors
4. Verify	Does the output actually confirm the claim?	"42 passed, 0 failed" confirms "tests pass". "41 passed, 1 failed" does not.
5. Claim	Only now make the statement	"All 42 tests pass" with the evidence visible

Verification Strategies by Change Type

Type-check and unit tests are the universal baseline — they are not sufficient proof on their own. Match the strategy to the change:

Change type	Required verification
Frontend (component, page, form)	Start dev server, exercise the feature in a browser, check console for errors, test the happy path AND one failure path
Backend handler / endpoint	`curl` the endpoint, verify response shape and status code, hit at least one error path (invalid input, missing auth)
CLI tool	Run the binary with real inputs, check stdout, stderr, and exit code. Run from `/tmp` (not the source folder) to catch "only works from source" bugs
Infra / IaC (Terraform, Dockerfile, k8s)	`terraform plan` / `docker build` / `kubectl apply --dry-run=server`; review the diff before applying
Database migration	Run migration up, run migration down, run migration up again against a copy of production-shape data. State what was tested
Refactoring (no behavior change)	Full test suite passes unchanged. Public API surface diff shows no breakage (`grep` exported identifiers)
Library / package update	Run the consumer's test suite against the new version. Check for deprecation warnings
Schema change	Old consumers still parse the new shape (forward compat); new consumers handle old shape if the old data still exists (backward compat)

Reading code is not a strategy. If the table above doesn't have a row for your change, state explicitly: "No runtime verification available — verified by reading the diff."

Adversarial Probes

For any change that touches production logic, include at least one adversarial probe in the verification — not a happy-path confirmation dressed up as verification. Pick the most relevant from:

Boundary value: 0, -1, empty string, empty array, null, undefined, MAX_INT, 1-char unicode combining mark
Concurrency: two parallel requests with the same identifier (for state changes, races, double-spend classes)
Idempotency: run the same mutation twice; the second should either no-op or error cleanly, not corrupt state
Orphan op: delete/update/get a nonexistent ID — does it 404/return-null as expected, or throw an internal error?

Docs changes, trivial typo fixes, and pure rename refactors are exempt. Everything else: one probe minimum. A report with zero adversarial probes is a happy-path confirmation, not verification.

Review Staleness

Before shipping, check whether prior reviews (agent or human) are still valid. If commits landed after the last review, verify the new changes don't invalidate review conclusions -- check that previously flagged issues are still fixed and no new code contradicts the review's approval. git log --oneline <review-commit>..HEAD shows what changed since the review.

When This Applies

About to say "tests pass" or "build succeeds"
About to commit, push, or create a PR
About to claim a bug is fixed
About to mark a task as complete
Moving to the next task in a plan
After completing each function or component during long sessions
At periodic checkpoints (~every 15 minutes of active coding)
Reporting results to the user
Agent reports success on delegated work
ANY expression of satisfaction about work state ("looking good", "that should do it")
ANY positive statement about completion, including paraphrases and synonyms

Red Flags

Fantasy assessment auto-fail. A claim of "zero issues found" on a first implementation pass is a red flag, not a green light. First implementations typically need 2-3 revision cycles. "Perfect on the first try" more likely means incomplete verification than flawless code. Re-verify with a broader scope.

Negative confirmation at signoff. When reporting verification results, include a brief statement of what defect classes were checked and NOT found, not just what passed. "Tests pass, no type errors, no lint warnings, no security flags in the changed files" is stronger than "tests pass" because it proves the scope of verification.

Agent Delegation

When a subagent reports success:

Check the VCS diff -- did the agent actually make changes?
Run the verification command yourself
Report the actual state, not the agent's claim

Never forward an agent's "all tests pass" without running the tests yourself.

Requirements vs Tests

"Tests pass" and "requirements met" are different claims:

Re-read the plan or requirements
Create a line-by-line checklist
Verify each item against the implementation
Report gaps or confirm completion

Passing tests prove the code works. They don't prove the right code was written.

Common Claims and Their Proof

Claim	Required Proof
"Tests pass"	Test runner output showing 0 failures, exit code 0
"Build succeeds"	Build command output with exit code 0
"Bug is fixed"	Original reproduction case now passes
"Feature complete"	All acceptance criteria verified individually
"No regressions"	Full test suite passes, not just new tests
"Regression test works"	Red-green cycle: test passes, revert fix, test fails, restore fix, test passes
"Linting clean"	Linter output showing 0 errors/warnings

When No Verification Command Exists

Some changes have no obvious test command -- documentation, configuration, infrastructure-as-code, skill files. In these cases:

Documentation/prose -- verify by reading the rendered output. Confirm links work, formatting is correct, content matches intent.
Configuration/infra -- verify syntax (jq . for JSON, yamllint for YAML, terraform validate, docker build). If no validator exists, read the file and confirm it matches the intended change.
Non-runnable changes -- verify by diffing (git diff) and confirming the diff matches what was intended. State explicitly: "No automated verification available. Verified by reading the diff."

The principle holds: state what you checked and how, even when a test suite doesn't apply.

When Verification Fails

If the output does not confirm the claim:

Do not claim completion. Report the actual failure output to the user.
Do not retry the same verification hoping for a different result. If it failed, something is wrong.
Return to implementation. Fix the issue, then re-run verification from Step 1 of the Gate Function.
If the failure is unrelated to your changes (pre-existing flaky test, environment issue), state this explicitly with evidence -- show that the failure also occurs on the base branch or is a known issue.

Rationalization Prevention

If you're reasoning about the outcome instead of running the command, the Gate is not satisfied. "Should work", "trivial change", "just a refactor", "new tests pass" (not "all tests pass"), "CI will catch it" -- these are all the same failure mode: substituting confidence for evidence. Any satisfaction expression ("looks good", "seems correct") triggers the Gate, spirit over letter.

Completion Report Format

After verification passes, produce a structured report rather than an open-ended summary. This surfaces scope discipline explicitly and makes the agent's restraint visible to reviewers.

## Completion report

**Status**: DONE / DONE_WITH_CONCERNS / BLOCKED / NEEDS_CONTEXT

**Changes made**
- path/to/file.ts: [one-line description of what changed and why]
- path/to/other.ts: [one-line description]

**Things I didn't touch (intentionally)**
- [thing noticed but out of scope, with one-line reason]
- [adjacent issue deferred, with one-line reason]

**Potential concerns**
- [any risk, uncertainty, or open question the reviewer should know about]
- [or "none"]

**Verification evidence**
- [command]: [exit code / result summary]

Status meanings:

DONE — task complete, all tests pass, no concerns
DONE_WITH_CONCERNS — complete but flagging risks; the Potential concerns section is mandatory
BLOCKED — cannot proceed. Name the blocker and what would unblock it
NEEDS_CONTEXT — missing information to start or continue. Name what's missing

The Things I didn't touch section is not optional — if nothing was noticed, write "nothing noticed." The goal is to prove the agent considered scope, not to pad the report.

References

System-Wide Test Check -- blast-radius verification for task completion (callbacks, integration, orphaned state)

Integration

This skill is referenced by:

/ia-work -- before marking tasks complete, before shipping, and before merge/PR creation (Phase 4)
ia-receiving-code-review -- verify each fix before marking resolved
ia-debugging -- before claiming a bug is fixed
ia-writing-tests -- tests as primary verification evidence
ia-design-iterator agent -- verify design changes render correctly
ia-figma-design-sync agent -- verify implementation matches Figma
/ia-verify command -- runs the full pre-PR verification pipeline