Install
openclaw skills install post-dev-verificationPost-development full-stack verification skill. Automatically triggered after Agent completes a development task. Executes production-level validation (unit + integration + E2E) with real-execution-first philosophy. Use when: (1) Development task is complete and needs verification, (2) User says "run tests", "verify", "validate", "quality check", (3) User says "交付", "验证", "跑测试", "质量检查", "验收", (4) Before creating PRs or merging code, (5) After implementing features, bug fixes, or refactoring, (6) User asks "does this work?", "can we ship this?", "is this ready?". Covers: test design (MFT/INV/DIR taxonomy), quality metrics (4 layers, 15 metrics), feedback-driven fix loop, anti-pattern detection, visible/hidden test separation, reusable test script generation.
openclaw skills install post-dev-verificationAutomated full-stack quality verification after development. Real execution by default -- mock is the last resort. Deliverability is judged by external calls (HTTP requests, CLI invocations, browser interactions), not by internal function calls passing in isolation.
Default realism level is L2: internal services run for real, only uncontrollable external dependencies (third-party APIs, paid services) may be mocked.
Downgrade signals (auto-detected from user intent):
Realism level definitions:
| Level | Description | Mock Ratio |
|---|---|---|
| L0 | All dependencies mocked | 100% |
| L1 | Core service real, databases mocked | <=50% |
| L2 | Internal services real, external deps mocked | <=30% |
| L3 | All services real (sandbox/test accounts) | 0% |
Execute phases sequentially. Each phase produces required artifacts for the next.
Phase 0: Environment Awareness
v
Phase 1: Test Design (with anti-pattern scan)
v
Phase 2: Execution & Evaluation
v
Phase 3: Feedback & Fix Loop (if gates fail)
v
Phase 4: Validation & Output
This skill starts services, runs migrations, and makes network calls. Before execution:
Gather project context and determine feasibility before designing tests.
Identify:
For each dependency service, classify:
Verify:
Output: Environment Report -- language, framework, test runner, realism level, service availability, any blockers, consistency gaps.
Design test scenarios systematically using the test taxonomy. Do NOT write test code before completing analysis.
Before writing any test code, complete:
If test code references modules/imports not in the actual codebase -> hallucinated dependency. Remove and use actual project references.
Map each requirement to test scenarios:
Anti-pattern check: If >80% of scenarios are happy path -> Happy Path Obsession detected. Add error, boundary, and exception scenarios until >=30% target error scenarios.
For each scenario, apply the appropriate taxonomy category. Load detailed guidance from references/test-taxonomy.md when needed.
| Category | Purpose | Example |
|---|---|---|
| MFT (Minimum Functionality) | Verify each decision branch/leaf node works | Each code path returns correct result |
| INV (Invariance) | Same logical request -> same result, different phrasings | "show data" = "display info" = "list records" |
| DIR (Directional Expectation) | Vary one input -> predict output direction | Larger input -> larger output (monotonic) |
Coverage rule: Each of the 3 categories MUST have >=1 test. Taxonomy coverage = 100% is a hard gate.
For every input parameter, identify and cover:
Classify each scenario based on realism level:
After designing all scenarios:
Run the pre-execution checklist. Load detailed guidance from references/anti-patterns.md when needed.
| Anti-Pattern | Check | Action |
|---|---|---|
| Happy Path Obsession | >80% scenarios are normal flow | Add error/boundary tests |
| Weak Assertions | assert(result != null), assert(status == 200) without body | Replace with specific value checks |
| Leap-to-Code | Test code written before structure analysis | Redo analysis first |
| Hallucinated Dependencies | References non-existent modules/imports | Replace with actual references |
| Missing Traceability | Generic test names (test_1, test_func) | Rename to describe specific behavior |
Rule: Each test name MUST describe the specific behavior being tested (e.g., test_submit_empty_form_returns_422). Each test MUST link to the requirement it validates.
Fix all detected anti-patterns before proceeding to Phase 2.
Start services in dependency order with health checks:
for each service in dependency_order:
start service
wait_for_health_check (port/ping/readiness endpoint)
if health_check fails:
report blocker, downgrade realism level
Prepare test data using project's existing seed/migration mechanisms when available.
System boot validation -- before running any tests, verify the system itself is deliverable:
Run tests with coverage enabled. The test suite MUST include an E2E layer validated through external calls (HTTP requests to running services, CLI invocations, or browser interactions -- not internal function imports). Load execution templates from references/real-e2e-templates.md when designing this layer.
Compute all 4 layers of metrics. Load detailed definitions from references/metrics.md when needed.
Design Quality (computed after test design, before execution):
| Metric | Formula | Threshold |
|---|---|---|
| Scenario Coverage | covered requirements / total requirements | MUST = 100% |
| Taxonomy Coverage | categories with >=1 test / 3 | MUST = 100% |
| Boundary Value Coverage | covered boundary points / total identified | SHOULD >= 90% |
| Data Feature Coverage | covered data dimensions / total identified | SHOULD >= 85% |
Execution Quality (computed after test run):
| Metric | Formula | Threshold |
|---|---|---|
| Pass Rate | passed tests / total tests | SHOULD >= 95% |
| Code Coverage | statements covered / total statements | SHOULD >= 80% |
| Assertion Density | total assertions / total tests | SHOULD >= 2.0 |
| Weak Assertion Ratio | weak assertions / total assertions | SHOULD <= 10% |
| Test Realism Ratio | real tests / total tests | MUST >= 70% |
Delivery Quality (computed from test results):
| Metric | Formula | Threshold |
|---|---|---|
| Expectation Match Rate | fully matching tests / total tests (core: MUST 100%) | Core: MUST=100%, Overall: SHOULD>=95% |
| Boundary Handling Rate | passing boundary tests / total boundary tests | SHOULD >= 90% |
| Regression Safety | still-passing tests / previously-passing tests | MUST = 100% |
| Business Flow Coverage | E2E-verified business flows / total identified business flows | MUST = 100% |
Iteration Efficiency (computed during fix loop):
| Metric | Formula | Threshold |
|---|---|---|
| Fix Convergence Rate | newly passing / previous failures | <20% for 2 rounds -> STOP |
| Fix Introduction Rate | newly failing / total fix attempts | >30% -> STOP |
Triggered when hard gate metrics are not met.
Structure the report as JSON. Load schema from references/feedback-schema.md when needed.
Key sections:
"pass" | "fix_and_retry" | "stop_and_report"Address the largest failure cluster first (most affected tests) -- highest probability of improving overall pass rate.
| Condition | Threshold | Action |
|---|---|---|
| Max iterations | 5 rounds | Stop, report current state |
| No convergence | <20% convergence rate for 2 consecutive rounds | Stop, suggest fundamental issue |
| Regression | >30% fix introduction rate in any round | Stop, suggest wrong fix approach |
while round <= 5 AND not converged_stop AND not regression_stop:
read feedback report -> identify largest failure cluster
apply targeted fix to code/tests
re-run full test suite (or failed-only if convergence is high)
compute new metrics
generate new feedback report
check exit conditions
record iteration in fix history
Run ALL hidden tests (not exposed during fix loop):
Generate final quality report:
"PASS" (all hard gates met) or "FAIL" (hard gates not met) with specific reasonsOutput test scripts that can run independently in CI/CD:
| Metric | Threshold |
|---|---|
| Scenario Coverage | = 100% |
| Taxonomy Coverage | = 100% |
| Test Realism Ratio | >= 70% |
| Expectation Match (core) | = 100% |
| Regression Safety | = 100% |
| Business Flow Coverage | = 100% |
| Metric | Threshold |
|---|---|
| Boundary Value Coverage | >= 90% |
| Data Feature Coverage | >= 85% |
| Pass Rate | >= 95% |
| Code Coverage | >= 80% |
| Assertion Density | >= 2.0 |
| Weak Assertion Ratio | <= 10% |
| Boundary Handling Rate | >= 90% |
Load these files as needed during the workflow:
references/metrics.md -- Complete definitions for all 14 metrics: calculation formulas, threshold rationale, and what it means when a metric is not met. Load when computing or interpreting metrics.
references/test-taxonomy.md -- Detailed guidance for MFT/INV/DIR test categories with pseudocode examples, plus systematic boundary value analysis methods. Load during Phase 1 test design.
references/feedback-schema.md -- Full JSON Schema for feedback reports with a populated example. Load when generating feedback reports in Phase 3.
references/anti-patterns.md -- Detailed detection methods and fix strategies for all 5 anti-patterns: Happy Path Obsession, Weak Assertions, Leap-to-Code, Hallucinated Dependencies, Missing Traceability. Load during Phase 1 anti-pattern scan or Phase 3 feedback.
references/real-e2e-templates.md -- Environment preparation scripts and real E2E test templates for HTTP API, CLI tools, and Browser automation. Load during Phase 2 environment preparation and test execution.