Install
openclaw skills install run-test-planExecute YAML test plan, stop on first failure, output rich debug prompt
openclaw skills install run-test-planExecute a YAML test plan, run setup commands, health checks, and each test sequentially. Stop on first failure with rich debug output.
agent-browser command-line tool to be available on PATH--plan <path>: Path to test plan (default: docs/testing/test-plan.yaml)--skip-setup: Skip setup commands and health checks (for re-running after failure)Read and validate the test plan:
# Resolve plan path from --plan (default shown)
PLAN_PATH="${PLAN_PATH:-docs/testing/test-plan.yaml}"
# Check file exists
ls "$PLAN_PATH" || { echo "Error: Test plan not found: $PLAN_PATH"; exit 1; }
# Validate YAML
python3 -c "import yaml; yaml.safe_load(open('$PLAN_PATH'))" || { echo "Error: Invalid YAML: $PLAN_PATH"; exit 1; }
Extract from the YAML:
setup.commands: List of setup commandssetup.health_checks: List of URLs to polltests: Array of test casesIf setup.prerequisites exists, verify each one:
# For each prerequisite in setup.prerequisites
<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }
If setup.env exists, export each variable. Variables using ${VAR} syntax should be resolved from the current environment:
# For each key/value in setup.env
export <key>="<value>"
If setup.build exists, execute build commands sequentially:
# For each command in setup.build
<command> || { echo "Build failed: <command>"; exit 1; }
If setup.services exists, start long-running processes and wait for health checks:
# For each service in setup.services
nohup <service.command> > .beagle/service-<index>.log 2>&1 &
echo $! > .beagle/service-<index>.pid
For each service with a health_check, poll until ready:
timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0
while [ $elapsed -lt $timeout ]; do
if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
echo "✓ Health check passed: $url"
break
fi
sleep 2
elapsed=$((elapsed + 2))
done
if [ $elapsed -ge $timeout ]; then
echo "✗ Health check timeout: $url"
exit 1
fi
If the plan uses the older flat format (setup.commands + setup.health_checks instead of prerequisites/build/services), fall back to executing setup.commands sequentially and polling setup.health_checks as before.
Do not start Step 4 until each condition you can check is true:
--plan (default docs/testing/test-plan.yaml) exists and parses as YAML (same checks as Step 1).--skip-setup: Every setup.prerequisites check that exists exited 0; every setup.build command succeeded; every service health_check reached HTTP 200, 301, or 302 within its timeout or legacy setup.health_checks passed after setup.commands.--skip-setup: Before TC-01, confirm anything the plan still needs is alive—at minimum one successful curl (or equivalent) to each URL in setup.health_checks or each setup.services[].health_check.url that the tests depend on.mkdir -p docs/testing/evidence succeeds and the directory exists.If any gate fails, stop, fix setup or flags, and do not execute tests.
For each test in the plan:
## Running: TC-XX - <test.name>
Context: <test.context>
For each step in test.steps, determine the step type and execute accordingly:
Shell commands (run: steps):
The most common step type. Execute the command via Bash and capture stdout, stderr, and exit code:
# Execute the command, capture output and exit code
<command> 2>&1
echo "EXIT_CODE: $?"
Capture all output for evaluation in step 4c. Shell steps cover:
./target/debug/myapp status --all)psql "${DATABASE_URL}" -c "SELECT ...")ls -la /path/to/expected/output)timeout 5 ./myapp 2>&1 || true)curl actions (action: curl steps):
curl -X <method> \
-H "Content-Type: application/json" \
<additional headers> \
-d '<body>' \
"<url>" \
-o response.json \
-w "%{http_code}" > status_code.txt
# Capture response for evaluation
cat response.json
cat status_code.txt
agent-browser CLI actions:
Steps starting with agent-browser are browser automation commands:
# Navigate
agent-browser open <url>
# Snapshot interactive elements (always do before interacting)
agent-browser snapshot -i
# Interact using refs from snapshot output (@e1, @e2, etc.)
agent-browser fill @<ref> "<value>"
agent-browser click @<ref>
# Wait for conditions
agent-browser wait --url "<pattern>"
agent-browser wait --text "<text>"
agent-browser wait --load networkidle
# Capture evidence
agent-browser screenshot docs/testing/evidence/<test.id>.png
Important: Always run agent-browser snapshot -i before interacting with elements to get valid refs, and re-snapshot after navigation or significant DOM changes.
Save screenshots to docs/testing/evidence/<test.id>.png
Gate — artifacts before PASS/FAIL:
run: steps: Stdout and stderr captured; exit code recorded (e.g. EXIT_CODE: line or equivalent).action: curl steps: Response body and HTTP status captured to known paths (e.g. response.json, status_code.txt or paths the plan specifies).agent-browser snapshot -i exists before asserting; if the test records evidence, the screenshot file path is created or failure is explicit.Then, using agent reasoning, compare actual outcome against test.expected:
✓ TC-XX PASSED: <test.name>
Continue to next test.
Stop immediately. Go to Step 6.
## Test Results: ALL PASSED
| ID | Name | Result |
|----|------|--------|
| TC-01 | <name> | ✓ PASS |
| TC-02 | <name> | ✓ PASS |
| ... | ... | ... |
**Total:** N/N tests passed
### Evidence
Screenshots saved to `docs/testing/evidence/`
### Cleanup
Stopping background services...
Clean up:
# Kill background services
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
When a test fails, generate rich debug output:
# Get changed files relevant to the failure
git diff --name-only $(git merge-base HEAD origin/main)..HEAD
# Get recent changes in files mentioned in test.context
git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>
## Test Failure: TC-XX - <test.name>
### What Failed
**Test:** <test.name>
**Expected:**
<test.expected>
**Actual:**
<Describe what actually happened - response code, error message, screenshot description>
### Relevant Changes in This PR
<For each file mentioned in test.context or related to the failure:>
- `<file>` (lines X-Y) - <brief description of changes>
### Evidence
<If screenshot exists:>
- Screenshot: `docs/testing/evidence/<test.id>.png`
<If API response:>
- Status code: <code>
- Response body:
```json
<response>
<Based on the error, suggest 2-3 specific things to check:>
Copy this to start a new agent session:
I'm debugging a test failure in branch <branch>.
Test: <test.name> Error:
Relevant files:
### 6c. Preserve Evidence
```bash
# Ensure evidence directory exists
mkdir -p docs/testing/evidence
# Save failure context
cat > docs/testing/evidence/<test.id>-failure.md << 'EOF'
# Failure Report: <test.id>
<Full debug report content>
EOF
# Kill background services
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
Always output a summary table showing progress:
## Test Results
| ID | Name | Result |
|----|------|--------|
| TC-01 | <name> | ✓ PASS |
| TC-02 | <name> | ✗ FAIL |
| TC-03 | <name> | - SKIP |
**Passed:** 1/3
**Failed:** TC-02
Tests after a failure are marked as SKIP (not executed).
Before completing:
# Verify evidence directory exists
ls -la docs/testing/evidence/
# List captured evidence
ls docs/testing/evidence/*.png docs/testing/evidence/*.md 2>/dev/null
Pass conditions (all must be true to call the run complete):
ls -la docs/testing/evidence/ exits 0 (directory exists).docs/testing/evidence/<test.id>-failure.md or equivalent) or the debug report was emitted with expected vs actual..beagle/service-*.pid pointing at live processes you started, unless the plan says to leave them up.Verification Checklist:
--skip-setup with live dependencies confirmed)docs/testing/evidence/setup / tests) are present.--skip-setup flag to re-run after fixing issues