Install
openclaw skills install gen-test-planAnalyze repo, detect stack, trace changes to user-facing entry points, generate E2E YAML test plan
openclaw skills install gen-test-planAnalyze the repository's tech stack, branch changes vs default, and generate an executable YAML test plan focused on user-facing impact.
This is an E2E test plan — not an automated test wrapper. The generated plan will be executed by an autonomous agent acting exactly as a human QA tester would: launching real binaries, hitting real endpoints, interacting with real databases, and verifying real observable behavior.
NEVER generate test steps that re-run the project's existing automated test suite. This means:
cargo test, pytest, npm test, go test, mix test, or equivalent commands as test stepsIf you find yourself writing a test step that invokes the project's test runner, stop and rethink. Ask: "What would a human tester do to verify this feature works?" The answer is never "run the unit tests."
What E2E test steps look like:
Complete these in order. Do not advance to the next gate until its Pass condition is met (each pass should leave retrievable evidence: pasted command output, a written list, or the generated file on disk). Scheduling: Gate 1 before Step 2; Gate 2 before Step 5; Gate 3 before Step 7; Gates 4–5 during Step 8 (after the Step 7 summary).
Diff and base pinned (after Step 1) — Resolve the base branch from --base when provided, otherwise use the repo default (main or master per Step 1). Compare HEAD to $(git merge-base HEAD origin/<base_branch>) (or equivalent if the remote ref differs). Pass: You record current_branch, base_branch, the merge-base SHA or range used, and changed_files from git diff --name-only <merge-base>..HEAD (empty list allowed if you paste or quote that output and state “no file changes vs base”).
Trace complete (after Step 4) — Pass: Every affected entry point you will test has a Core functionality vs Configuration/admin classification, and the Step 4 requirement holds: at least one test targets a core entry point or you document why that is impossible and flag manual review.
Plan file valid (after Step 6, before Step 7) — Pass: docs/testing/test-plan.yaml exists and the following command exits 0 (parses the YAML and asserts all four top-level keys are present — a single grep -E with alternations would pass on any one match, so do not substitute it):
python3 -c "import sys, yaml; d = yaml.safe_load(open('docs/testing/test-plan.yaml')) or {}; missing = [k for k in ('version', 'metadata', 'setup', 'tests') if k not in d]; sys.exit('Missing keys: ' + ', '.join(missing) if missing else 0)"
No automated-test duplication (Step 8) — Pass: Every run: step and every services: command: is scanned for project test runners (cargo test, pytest, npm test, go test, mix test, jest, vitest, mocha, etc.); zero invocations. If any appear, remove or replace them with real E2E actions and re-run Gate 3.
Behavioral coverage (Step 8) — Pass: Re-read metadata.changes_summary and recent commit messages; at least one test’s context/steps exercises the primary user-visible behavior they describe. If they describe a capability (e.g., a new provider) but no step invokes it, add that test or fail verification.
--base <branch>: Base branch to diff against (default: main)# Get current branch
git rev-parse --abbrev-ref HEAD
# Resolve base branch: use --base if supplied, otherwise default (main → master)
BASE_BRANCH="${BASE_BRANCH:-$(git rev-parse --verify origin/main >/dev/null 2>&1 && echo main || echo master)}"
MERGE_BASE="$(git merge-base HEAD "origin/${BASE_BRANCH}")"
# Get changed files vs base
git diff --name-only "${MERGE_BASE}"..HEAD
# Get commit messages for context
git log --oneline "${MERGE_BASE}"..HEAD
Capture:
current_branch: Branch namebase_branch: Default branch to compare againstchanged_files: List of modified filescommit_messages: What the PR is aboutSee references/stack-discovery.md for stack detection commands, entrypoint discovery, port discovery, and trace rules.
A "user-facing entry point" is anything a human interacts with: CLI subcommands, HTTP endpoints, UI routes, TUI screens, gRPC services, database migrations, or configuration files that affect runtime behavior.
# Rust (clap) — look for Subcommand derives and command enums
grep -rn "Subcommand\|#\[command\]" --include="*.rs" | head -20
# Python (click/typer/argparse)
grep -rn "@click.command\|@app.command\|add_parser\|add_subparser" --include="*.py" | head -20
# Go (cobra)
grep -rn "cobra.Command\|AddCommand" --include="*.go" | head -20
Build a map of:
env, std::env::var, os.Getenv, os.environ)Python (FastAPI/Flask):
grep -rn "@app\.\(get\|post\|put\|delete\|patch\)" --include="*.py" | head -20
grep -rn "@router\.\(get\|post\|put\|delete\|patch\)" --include="*.py" | head -20
Node.js (Express/Fastify):
grep -rn "app\.\(get\|post\|put\|delete\)" --include="*.ts" --include="*.js" | head -20
grep -rn "router\.\(get\|post\|put\|delete\)" --include="*.ts" --include="*.js" | head -20
Rust (axum/actix/rocket):
grep -rn "Router::new\|\.route(\|#\[get\]\|#\[post\]\|HttpServer" --include="*.rs" | head -20
Go (net/http, gin, chi):
grep -rn "http.HandleFunc\|r.GET\|r.POST\|router.Get\|router.Post" --include="*.go" | head -20
Elixir (Phoenix):
grep -rn "get \"/\|post \"/\|pipe_through\|live \"/\|scope \"/\"" --include="*.ex" | head -20
grep -rn "createBrowserRouter\|<Route\|path=" --include="*.tsx" --include="*.jsx" | head -20
# SQL migrations
ls migrations/ db/migrate/ priv/repo/migrations/ 2>/dev/null
# Schema files
ls schema.sql schema.prisma 2>/dev/null
For each changed file, determine if it affects user-facing functionality:
# Rust — use/mod/crate references and workspace deps
grep -rn "use.*<crate>\|mod <module>" --include="*.rs"
grep -rn "<crate-name>" --include="Cargo.toml"
# Python — from/import
grep -rn "from.*<module>\|import.*<module>" --include="*.py"
# TypeScript/JavaScript — import/require
grep -rn "from.*<module>\|require.*<module>" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx"
# Elixir — alias/import/use
grep -rn "alias.*<Module>\|import.*<Module>\|use.*<Module>" --include="*.ex" --include="*.exs"
# Go — package references
grep -rn "<package>\." --include="*.go"
If the ecosystem is not covered above, or grep results are inconclusive, read the project's CLAUDE.md, README, or architecture docs to understand the module graph and trace the data flow from changed files to user-facing entry points.
After identifying all affected entry points, classify each one:
| Category | Description | Examples | Priority |
|---|---|---|---|
| Core functionality | Entry points where the feature does its actual work for the end user | Chat endpoint, API action, data processing pipeline, generation flow | High — test first |
| Configuration/admin | Entry points where the feature is set up, toggled, or configured | Settings page, admin dashboard, preference toggles, dropdown selections | Lower — test after core |
Classification rules:
Requirement: At least one test must target a core functionality entry point before generating configuration/admin tests. If no core functionality entry point can be identified, explicitly document why and flag this for manual review.
Output: For each affected entry point, document:
See references/test-case-generation.md for the detailed API/browser templates, prioritization rules, and test-case guidelines.
Create the test plan file:
mkdir -p docs/testing
Write to docs/testing/test-plan.yaml:
version: 1
metadata:
branch: <current_branch>
base: <base_branch>
generated: <ISO timestamp>
changes_summary: |
<Summary of what this PR changes based on commit messages and diff>
setup:
stack:
- type: <rust|node|python|go|elixir|docker>
package_manager: <cargo|pnpm|npm|yarn|uv|poetry|mix|none>
prerequisites:
# Services or infrastructure the tests need running
- name: <e.g., PostgreSQL>
check: <command to verify it's available, e.g., "pg_isready -h localhost">
build:
# Commands to build the project artifacts (binaries, assets, etc.)
- <build command, e.g., "cargo build --workspace">
services:
# Long-running processes to start before tests (servers, watchers, etc.)
# Omit if the project is a CLI tool or library with no server component
- command: <start command>
health_check:
url: http://localhost:<port>/health
timeout: 30
env:
# Environment variables needed by tests (use ${VAR} for secrets)
DATABASE_URL: "${DATABASE_URL}"
tests:
# CLI test example — run the built binary with real arguments:
- id: TC-01
name: <CLI test name>
context: |
<Why this test exists, which changes affect it>
steps:
- run: <command that a human would type in their terminal>
- run: <follow-up command to verify the effect>
expected: |
<Expected behavior: exit code, stdout content, side effects>
# API test example:
- id: TC-02
name: <API test name>
context: |
<Why this test exists, which changes affect it>
steps:
- action: curl
method: GET
url: http://localhost:<port>/<path>
expected: |
<Expected behavior in natural language>
# Database verification example:
- id: TC-03
name: <Database test name>
context: |
<Why this test exists, which changes affect it>
steps:
- run: <command that writes to the database>
- run: psql "${DATABASE_URL}" -c "SELECT ... FROM ... WHERE ..."
expected: |
<Expected rows, schema state, or migration effect>
# Browser test example (always use agent-browser CLI commands):
- id: TC-04
name: <UI test name>
context: |
<Why this test exists, which changes affect it>
steps:
- run: agent-browser open http://localhost:<port>/<path>
- run: agent-browser snapshot -i
- run: agent-browser click @<ref>
- run: agent-browser snapshot -i
- run: agent-browser screenshot evidence/tc-04.png
expected: |
<Expected behavior in natural language>
evidence:
screenshot: evidence/tc-04.png
After generating the test plan:
## Test Plan Generated
**File:** `docs/testing/test-plan.yaml`
**Branch:** <current_branch> → <base_branch>
### Detected Stack
| Component | Type | Port |
|-----------|------|------|
| <component> | <type> | <port> |
### Tests Generated
| ID | Name | Type | Affected By |
|----|------|------|-------------|
| TC-01 | <name> | curl/browser | <files> |
### Entry Point Coverage
- **Covered:** <N> entry points with tests
- **Unchanged:** <M> entry points not affected by this PR
### Next Steps
1. Review the generated test plan at `docs/testing/test-plan.yaml`
2. Adjust test values and expectations as needed
3. Run the tests by invoking the **run-test-plan** skill ([run-test-plan](../run-test-plan/SKILL.md))
Confirm Hard gates 1–5 are satisfied with evidence (see Hard gates above) before treating the plan as complete. Then run:
# Verify file was created
ls -la docs/testing/test-plan.yaml
# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" && echo "Valid YAML"
# Check required fields
grep -E "^version:|^metadata:|^setup:|^tests:" docs/testing/test-plan.yaml
Verification Checklist:
docs/testing/test-plan.yamlrun: and command: step in the plan for test runner invocations (cargo test, pytest, npm test, go test, mix test, jest, vitest, mocha, etc.). If ANY step invokes the project's test runner, the plan fails verification. Remove those steps and replace them with real E2E actions.changes_summary. Re-read the changes_summary and commit messages — if they describe a capability (e.g., "adds a new LLM provider") but no test invokes that capability (e.g., sends a message through the provider), the plan fails verification. Add the missing core functionality test before completing.docs/testing/ directory if it doesn't existexpected field (agent will interpret)agent-browser CLI commands (e.g., agent-browser open, agent-browser snapshot -i, agent-browser click @ref) — never use abstract action syntaxagent-browser snapshot -i before interacting with elements and after navigation/DOM changesagent-browser screenshot <path> to capture evidence for browser tests${ENV_VAR} syntax for secrets, never hardcode credentials