verification-before-completion

v2.56.1

Enforces fresh verification evidence before any completion claim. Use when about to claim "tests pass", "bug fixed", "done", "ready to merge", or handing off...

0· 158·0 current·0 all-time
byIlia Alshanetsky@iliaal
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (enforce fresh verification before completion) aligns with the SKILL.md: it tells the agent to run verification commands (build/typecheck/lint/test/security/diff), inspect VCS state, and confirm scope before edits. The enabled actions are appropriate for the stated goal.
Instruction Scope
The instructions require running repository commands (git status, git log, diff), test/build commands (pytest, npm test, etc.), grepping the tree (rg), and potentially invoking health endpoints (curl localhost). These are consistent with verification but do grant the agent the ability to read the working tree and run arbitrary verification commands — which can touch many files and may exercise code paths that depend on external services or secrets. The SKILL.md does not attempt to read unrelated system files or exfiltrate data; scope is narrowly about repo verification, but users should be aware that running tests/builds can cause network calls or access environment secrets configured for those commands.
Install Mechanism
Instruction-only skill with no install spec and no code files. Nothing is downloaded or written to disk by an installer — low install risk.
Credentials
The skill declares no required environment variables or credentials, which is appropriate. However, because it instructs running build/test/security commands, those commands may in practice read service credentials, CI tokens, or other environment secrets present in your environment. The skill itself does not explicitly request or transmit secrets.
Persistence & Privilege
always:false and no special persistence or configuration changes are requested. The skill does not ask to modify other skills or system-wide agent settings. Model invocation is allowed (platform default) which is appropriate for an enforcement/checking skill.
Assessment
This skill is coherent and does what it claims: it will run verification commands and inspect the repository before allowing a completion claim. Before installing, be aware that (1) the agent will need access to the working tree and to tools like git, ripgrep (rg), and whatever test/build tools your project uses — the SKILL.md does not list binaries but expects them to exist; (2) running tests/builds can exercise code that calls external services or reads environment secrets (database credentials, API keys, CI tokens). If you do not want the agent to run commands that can access network or secrets, do not grant it filesystem/network/credential access. Consider running verification in a controlled environment (CI, sandboxed runner) or restricting the agent's permissions. If you rely on particular tools (rg, pytest, npm), ensure they are present or adapt the instructions accordingly.

Like a lobster shell, security has layers — review code before you run it.

latestvk97bvtm4termg5bs562e7sshyh853ahr
158downloads
0stars
6versions
Updated 20h ago
v2.56.1
MIT-0

Verification Before Completion

The Rule

No completion claims without fresh verification evidence. If the verification command has not been run immediately before the claim, the claim cannot be made. Violating the letter of this rule is violating the spirit -- rephrasing a claim to technically avoid the wording does not exempt it from verification.

"Should pass", "probably works", and "looks correct" are not verification. Only command output confirming the claim counts (typically exit code 0). If pre-existing failures cause non-zero exits unrelated to your changes, see "When Verification Fails" below.

Pre-Verification Check

Before running verification, check the working tree state: git status --porcelain. If there are uncommitted changes unrelated to the current task, handle them first (commit, stash, or acknowledge). Adding verification commits on top of a dirty tree creates tangled history.

When verifying work delegated to a subagent, do not trust the implementer's own report. Read the actual code or test output independently. Spec compliance and quality are separate concerns -- verify both.

Scope Confirmation (Pre-Edit Gate)

When a request uses ambiguous spatial scope -- "migrate my project", "refactor the codebase", "update everywhere", "fix this across the app", "my code/repo/project" -- confirm the concrete scope before any Write or Edit. Imperative phrasing is not the same as defined scope.

Run a breakdown command to surface the real blast radius, then present it for confirmation:

# How many files actually match?
rg -l 'pattern' | cut -d/ -f1 | sort | uniq -c | sort -rn

# Which directories are affected?
rg -l 'pattern' | xargs dirname | sort -u

Present the result and ask: "This touches N files across M subsystems. Scope options: (a) everything, (b) just <subset>, (c) let me pick specific files." Do not start editing until the user commits to one.

This is cheap insurance: the alternative is shipping a sprawling diff that the user then has to unwind. "Migrate my project" is the #1 request shape that produces accidental cross-cutting damage.

When this applies: any request whose scope could plausibly span more than one directory AND where the user has not enumerated files. For a request with explicit file paths, skip this gate.

Gate Function

Before any success claim, run through these five steps:

StepActionExample
1. IdentifyWhat command proves this claim? Run in order: build -> typecheck -> lint -> test -> security scan -> diff review. Stop on first failure -- later steps are meaningless if earlier ones fail.pytest tests/, npm test, curl -s localhost:3000/health
2. RunExecute the full command, fresh (run in this message, with output shown -- cannot reuse prior results)Not "I ran it earlier" -- run it now
3. ReadRead the complete output, check exit codeDon't scan for "passed" -- read failure counts, warnings, errors
4. VerifyDoes the output actually confirm the claim?"42 passed, 0 failed" confirms "tests pass". "41 passed, 1 failed" does not.
5. ClaimOnly now make the statement"All 42 tests pass" with the evidence visible

Review Staleness

Before shipping, check whether prior reviews (agent or human) are still valid. If commits landed after the last review, verify the new changes don't invalidate review conclusions -- check that previously flagged issues are still fixed and no new code contradicts the review's approval. git log --oneline <review-commit>..HEAD shows what changed since the review.

When This Applies

  • About to say "tests pass" or "build succeeds"
  • About to commit, push, or create a PR
  • About to claim a bug is fixed
  • About to mark a task as complete
  • Moving to the next task in a plan
  • After completing each function or component during long sessions
  • At periodic checkpoints (~every 15 minutes of active coding)
  • Reporting results to the user
  • Agent reports success on delegated work
  • ANY expression of satisfaction about work state ("looking good", "that should do it")
  • ANY positive statement about completion, including paraphrases and synonyms

Red Flags

Fantasy assessment auto-fail. A claim of "zero issues found" on a first implementation pass is a red flag, not a green light. First implementations typically need 2-3 revision cycles. "Perfect on the first try" more likely means incomplete verification than flawless code. Re-verify with a broader scope.

Negative confirmation at signoff. When reporting verification results, include a brief statement of what defect classes were checked and NOT found, not just what passed. "Tests pass, no type errors, no lint warnings, no security flags in the changed files" is stronger than "tests pass" because it proves the scope of verification.

Agent Delegation

When a subagent reports success:

  1. Check the VCS diff -- did the agent actually make changes?
  2. Run the verification command yourself
  3. Report the actual state, not the agent's claim

Never forward an agent's "all tests pass" without running the tests yourself.

Requirements vs Tests

"Tests pass" and "requirements met" are different claims:

  1. Re-read the plan or requirements
  2. Create a line-by-line checklist
  3. Verify each item against the implementation
  4. Report gaps or confirm completion

Passing tests prove the code works. They don't prove the right code was written.

Common Claims and Their Proof

ClaimRequired Proof
"Tests pass"Test runner output showing 0 failures, exit code 0
"Build succeeds"Build command output with exit code 0
"Bug is fixed"Original reproduction case now passes
"Feature complete"All acceptance criteria verified individually
"No regressions"Full test suite passes, not just new tests
"Regression test works"Red-green cycle: test passes, revert fix, test fails, restore fix, test passes
"Linting clean"Linter output showing 0 errors/warnings

When No Verification Command Exists

Some changes have no obvious test command -- documentation, configuration, infrastructure-as-code, skill files. In these cases:

  • Documentation/prose -- verify by reading the rendered output. Confirm links work, formatting is correct, content matches intent.
  • Configuration/infra -- verify syntax (jq . for JSON, yamllint for YAML, terraform validate, docker build). If no validator exists, read the file and confirm it matches the intended change.
  • Non-runnable changes -- verify by diffing (git diff) and confirming the diff matches what was intended. State explicitly: "No automated verification available. Verified by reading the diff."

The principle holds: state what you checked and how, even when a test suite doesn't apply.

When Verification Fails

If the output does not confirm the claim:

  1. Do not claim completion. Report the actual failure output to the user.
  2. Do not retry the same verification hoping for a different result. If it failed, something is wrong.
  3. Return to implementation. Fix the issue, then re-run verification from Step 1 of the Gate Function.
  4. If the failure is unrelated to your changes (pre-existing flaky test, environment issue), state this explicitly with evidence -- show that the failure also occurs on the base branch or is a known issue.

Rationalization Prevention

If you're reasoning about the outcome instead of running the command, the Gate is not satisfied. "Should work", "trivial change", "just a refactor", "new tests pass" (not "all tests pass"), "CI will catch it" -- these are all the same failure mode: substituting confidence for evidence. Any satisfaction expression ("looks good", "seems correct") triggers the Gate, spirit over letter.

Completion Report Format

After verification passes, produce a structured report rather than an open-ended summary. This surfaces scope discipline explicitly and makes the agent's restraint visible to reviewers.

## Completion report

**Status**: DONE / DONE_WITH_CONCERNS / BLOCKED / NEEDS_CONTEXT

**Changes made**
- path/to/file.ts: [one-line description of what changed and why]
- path/to/other.ts: [one-line description]

**Things I didn't touch (intentionally)**
- [thing noticed but out of scope, with one-line reason]
- [adjacent issue deferred, with one-line reason]

**Potential concerns**
- [any risk, uncertainty, or open question the reviewer should know about]
- [or "none"]

**Verification evidence**
- [command]: [exit code / result summary]

Status meanings:

  • DONE — task complete, all tests pass, no concerns
  • DONE_WITH_CONCERNS — complete but flagging risks; the Potential concerns section is mandatory
  • BLOCKED — cannot proceed. Name the blocker and what would unblock it
  • NEEDS_CONTEXT — missing information to start or continue. Name what's missing

The Things I didn't touch section is not optional — if nothing was noticed, write "nothing noticed." The goal is to prove the agent considered scope, not to pad the report.

References

  • System-Wide Test Check -- blast-radius verification for task completion (callbacks, integration, orphaned state)

Integration

This skill is referenced by:

  • workflows:work -- before marking tasks complete, before shipping, and before merge/PR creation (Phase 4)
  • receiving-code-review -- verify each fix before marking resolved
  • debugging -- before claiming a bug is fixed
  • writing-tests -- tests as primary verification evidence
  • design-iterator agent -- verify design changes render correctly
  • figma-design-sync agent -- verify implementation matches Figma
  • /verify command -- runs the full pre-PR verification pipeline

Comments

Loading comments...