verification-before-completion
v2.56.1Enforces fresh verification evidence before any completion claim. Use when about to claim "tests pass", "bug fixed", "done", "ready to merge", or handing off...
Like a lobster shell, security has layers — review code before you run it.
Verification Before Completion
The Rule
No completion claims without fresh verification evidence. If the verification command has not been run immediately before the claim, the claim cannot be made. Violating the letter of this rule is violating the spirit -- rephrasing a claim to technically avoid the wording does not exempt it from verification.
"Should pass", "probably works", and "looks correct" are not verification. Only command output confirming the claim counts (typically exit code 0). If pre-existing failures cause non-zero exits unrelated to your changes, see "When Verification Fails" below.
Pre-Verification Check
Before running verification, check the working tree state: git status --porcelain. If there are uncommitted changes unrelated to the current task, handle them first (commit, stash, or acknowledge). Adding verification commits on top of a dirty tree creates tangled history.
When verifying work delegated to a subagent, do not trust the implementer's own report. Read the actual code or test output independently. Spec compliance and quality are separate concerns -- verify both.
Scope Confirmation (Pre-Edit Gate)
When a request uses ambiguous spatial scope -- "migrate my project", "refactor the codebase", "update everywhere", "fix this across the app", "my code/repo/project" -- confirm the concrete scope before any Write or Edit. Imperative phrasing is not the same as defined scope.
Run a breakdown command to surface the real blast radius, then present it for confirmation:
# How many files actually match?
rg -l 'pattern' | cut -d/ -f1 | sort | uniq -c | sort -rn
# Which directories are affected?
rg -l 'pattern' | xargs dirname | sort -u
Present the result and ask: "This touches N files across M subsystems. Scope options: (a) everything, (b) just <subset>, (c) let me pick specific files." Do not start editing until the user commits to one.
This is cheap insurance: the alternative is shipping a sprawling diff that the user then has to unwind. "Migrate my project" is the #1 request shape that produces accidental cross-cutting damage.
When this applies: any request whose scope could plausibly span more than one directory AND where the user has not enumerated files. For a request with explicit file paths, skip this gate.
Gate Function
Before any success claim, run through these five steps:
| Step | Action | Example |
|---|---|---|
| 1. Identify | What command proves this claim? Run in order: build -> typecheck -> lint -> test -> security scan -> diff review. Stop on first failure -- later steps are meaningless if earlier ones fail. | pytest tests/, npm test, curl -s localhost:3000/health |
| 2. Run | Execute the full command, fresh (run in this message, with output shown -- cannot reuse prior results) | Not "I ran it earlier" -- run it now |
| 3. Read | Read the complete output, check exit code | Don't scan for "passed" -- read failure counts, warnings, errors |
| 4. Verify | Does the output actually confirm the claim? | "42 passed, 0 failed" confirms "tests pass". "41 passed, 1 failed" does not. |
| 5. Claim | Only now make the statement | "All 42 tests pass" with the evidence visible |
Review Staleness
Before shipping, check whether prior reviews (agent or human) are still valid. If commits landed after the last review, verify the new changes don't invalidate review conclusions -- check that previously flagged issues are still fixed and no new code contradicts the review's approval. git log --oneline <review-commit>..HEAD shows what changed since the review.
When This Applies
- About to say "tests pass" or "build succeeds"
- About to commit, push, or create a PR
- About to claim a bug is fixed
- About to mark a task as complete
- Moving to the next task in a plan
- After completing each function or component during long sessions
- At periodic checkpoints (~every 15 minutes of active coding)
- Reporting results to the user
- Agent reports success on delegated work
- ANY expression of satisfaction about work state ("looking good", "that should do it")
- ANY positive statement about completion, including paraphrases and synonyms
Red Flags
Fantasy assessment auto-fail. A claim of "zero issues found" on a first implementation pass is a red flag, not a green light. First implementations typically need 2-3 revision cycles. "Perfect on the first try" more likely means incomplete verification than flawless code. Re-verify with a broader scope.
Negative confirmation at signoff. When reporting verification results, include a brief statement of what defect classes were checked and NOT found, not just what passed. "Tests pass, no type errors, no lint warnings, no security flags in the changed files" is stronger than "tests pass" because it proves the scope of verification.
Agent Delegation
When a subagent reports success:
- Check the VCS diff -- did the agent actually make changes?
- Run the verification command yourself
- Report the actual state, not the agent's claim
Never forward an agent's "all tests pass" without running the tests yourself.
Requirements vs Tests
"Tests pass" and "requirements met" are different claims:
- Re-read the plan or requirements
- Create a line-by-line checklist
- Verify each item against the implementation
- Report gaps or confirm completion
Passing tests prove the code works. They don't prove the right code was written.
Common Claims and Their Proof
| Claim | Required Proof |
|---|---|
| "Tests pass" | Test runner output showing 0 failures, exit code 0 |
| "Build succeeds" | Build command output with exit code 0 |
| "Bug is fixed" | Original reproduction case now passes |
| "Feature complete" | All acceptance criteria verified individually |
| "No regressions" | Full test suite passes, not just new tests |
| "Regression test works" | Red-green cycle: test passes, revert fix, test fails, restore fix, test passes |
| "Linting clean" | Linter output showing 0 errors/warnings |
When No Verification Command Exists
Some changes have no obvious test command -- documentation, configuration, infrastructure-as-code, skill files. In these cases:
- Documentation/prose -- verify by reading the rendered output. Confirm links work, formatting is correct, content matches intent.
- Configuration/infra -- verify syntax (
jq .for JSON,yamllintfor YAML,terraform validate,docker build). If no validator exists, read the file and confirm it matches the intended change. - Non-runnable changes -- verify by diffing (
git diff) and confirming the diff matches what was intended. State explicitly: "No automated verification available. Verified by reading the diff."
The principle holds: state what you checked and how, even when a test suite doesn't apply.
When Verification Fails
If the output does not confirm the claim:
- Do not claim completion. Report the actual failure output to the user.
- Do not retry the same verification hoping for a different result. If it failed, something is wrong.
- Return to implementation. Fix the issue, then re-run verification from Step 1 of the Gate Function.
- If the failure is unrelated to your changes (pre-existing flaky test, environment issue), state this explicitly with evidence -- show that the failure also occurs on the base branch or is a known issue.
Rationalization Prevention
If you're reasoning about the outcome instead of running the command, the Gate is not satisfied. "Should work", "trivial change", "just a refactor", "new tests pass" (not "all tests pass"), "CI will catch it" -- these are all the same failure mode: substituting confidence for evidence. Any satisfaction expression ("looks good", "seems correct") triggers the Gate, spirit over letter.
Completion Report Format
After verification passes, produce a structured report rather than an open-ended summary. This surfaces scope discipline explicitly and makes the agent's restraint visible to reviewers.
## Completion report
**Status**: DONE / DONE_WITH_CONCERNS / BLOCKED / NEEDS_CONTEXT
**Changes made**
- path/to/file.ts: [one-line description of what changed and why]
- path/to/other.ts: [one-line description]
**Things I didn't touch (intentionally)**
- [thing noticed but out of scope, with one-line reason]
- [adjacent issue deferred, with one-line reason]
**Potential concerns**
- [any risk, uncertainty, or open question the reviewer should know about]
- [or "none"]
**Verification evidence**
- [command]: [exit code / result summary]
Status meanings:
- DONE — task complete, all tests pass, no concerns
- DONE_WITH_CONCERNS — complete but flagging risks; the
Potential concernssection is mandatory - BLOCKED — cannot proceed. Name the blocker and what would unblock it
- NEEDS_CONTEXT — missing information to start or continue. Name what's missing
The Things I didn't touch section is not optional — if nothing was noticed, write "nothing noticed." The goal is to prove the agent considered scope, not to pad the report.
References
- System-Wide Test Check -- blast-radius verification for task completion (callbacks, integration, orphaned state)
Integration
This skill is referenced by:
workflows:work-- before marking tasks complete, before shipping, and before merge/PR creation (Phase 4)receiving-code-review-- verify each fix before marking resolveddebugging-- before claiming a bug is fixedwriting-tests-- tests as primary verification evidencedesign-iteratoragent -- verify design changes render correctlyfigma-design-syncagent -- verify implementation matches Figma/verifycommand -- runs the full pre-PR verification pipeline
Comments
Loading comments...
