Run Test Plan

v1.0.1

Execute YAML test plan, stop on first failure, output rich debug prompt

⭐ 0· 90·1 current·1 all-time

byKevin Anderson@anderskev

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for anderskev/run-test-plan.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Run Test Plan" (anderskev/run-test-plan) from ClawHub.
Skill page: https://clawhub.ai/anderskev/run-test-plan
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install run-test-plan

ClawHub CLI

Package manager switcher

npx clawhub@latest install run-test-plan

Security Scan

VirusTotal

Pending

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

Name and description match the instructions: the SKILL.md describes parsing a YAML test plan, running setup, health checks, and sequential tests (shell, curl, agent-browser). The agent-browser dependency and use of curl, nohup, and file-based evidence/pid logs are expected for a test runner.

Instruction Scope

The instructions tell the agent to execute arbitrary shell commands taken directly from the YAML plan and to export environment values resolved from the current environment. There is no explicit restriction or sanitization of those commands or network endpoints. The skill also writes logs/pids to .beagle/ and evidence to docs/testing/evidence/. Executing arbitrary commands from an untrusted plan can run anything on the host.

✓

Install Mechanism

Instruction-only skill with no install spec and no code files — nothing will be downloaded or written by an installer. This minimizes install-time risk.

Credentials

The skill explicitly resolves ${VAR} references from the current environment when exporting setup.env values but declares no required environment variables. That mismatch means it will read whatever env vars are present (including secrets like DATABASE_URL, AWS_*, etc.) without documenting them. Tests run by the plan could also access files or network resources and thereby exfiltrate secrets.

✓

Persistence & Privilege

always is false and disable-model-invocation is true (it cannot autonomously call the model). The skill only writes service logs/pids and evidence under repository-style paths (.beagle/, docs/testing/evidence/) and does not attempt to change other skills or global agent settings.

What to consider before installing

This skill is a legitimate test-runner but can execute any shell command embedded in the YAML plan and will import environment variables from the running environment. Before using it: (1) review the test plan file yourself for any dangerous commands or network calls; (2) run it in an isolated environment or CI runner that has no sensitive credentials or secrets mounted; (3) remove or sanitize any environment variables you don't want exposed (DATABASE_URL, AWS_*, SSH keys, etc.); (4) prefer running only plans from trusted sources; and (5) if you must run untrusted plans, restrict network access and filesystem permissions for the runner process.

Like a lobster shell, security has layers — review code before you run it.

latestvk974284cez4xq61c3em1w0ngzd85bsya

90downloads

0stars

2versions

Updated 6d ago

v1.0.1

MIT-0

Run Test Plan

Execute a YAML test plan, run setup commands, health checks, and each test sequentially. Stop on first failure with rich debug output.

Prerequisites

agent-browser skill: Browser tests require the agent-browser:agent-browser skill to be available

Arguments

--plan <path>: Path to test plan (default: docs/testing/test-plan.yaml)
--skip-setup: Skip setup commands and health checks (for re-running after failure)

Step 1: Parse Test Plan

Read and validate the test plan:

# Resolve plan path from --plan (default shown)
PLAN_PATH="${PLAN_PATH:-docs/testing/test-plan.yaml}"

# Check file exists
ls "$PLAN_PATH" || { echo "Error: Test plan not found: $PLAN_PATH"; exit 1; }

# Validate YAML
python3 -c "import yaml; yaml.safe_load(open('$PLAN_PATH'))" || { echo "Error: Invalid YAML: $PLAN_PATH"; exit 1; }

Extract from the YAML:

setup.commands: List of setup commands
setup.health_checks: List of URLs to poll
tests: Array of test cases

Step 2: Run Setup (unless --skip-setup)

2a. Check Prerequisites

If setup.prerequisites exists, verify each one:

# For each prerequisite in setup.prerequisites
<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }

2b. Set Environment Variables

If setup.env exists, export each variable. Variables using ${VAR} syntax should be resolved from the current environment:

# For each key/value in setup.env
export <key>="<value>"

2c. Build

If setup.build exists, execute build commands sequentially:

# For each command in setup.build
<command> || { echo "Build failed: <command>"; exit 1; }

2d. Start Services

If setup.services exists, start long-running processes and wait for health checks:

# For each service in setup.services
nohup <service.command> > .beagle/service-<index>.log 2>&1 &
echo $! > .beagle/service-<index>.pid

For each service with a health_check, poll until ready:

timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0

while [ $elapsed -lt $timeout ]; do
  if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
    echo "✓ Health check passed: $url"
    break
  fi
  sleep 2
  elapsed=$((elapsed + 2))
done

if [ $elapsed -ge $timeout ]; then
  echo "✗ Health check timeout: $url"
  exit 1
fi

2e. Legacy Setup Format

If the plan uses the older flat format (setup.commands + setup.health_checks instead of prerequisites/build/services), fall back to executing setup.commands sequentially and polling setup.health_checks as before.

Step 3: Gate — setup ready before tests

Do not start Step 4 until each condition you can check is true:

Plan load: The file from --plan (default docs/testing/test-plan.yaml) exists and parses as YAML (same checks as Step 1).
Setup branch:
- If not using --skip-setup: Every setup.prerequisites check that exists exited 0; every setup.build command succeeded; every service health_check reached HTTP 200, 301, or 302 within its timeout or legacy setup.health_checks passed after setup.commands.
- If using --skip-setup: Before TC-01, confirm anything the plan still needs is alive—at minimum one successful curl (or equivalent) to each URL in setup.health_checks or each setup.services[].health_check.url that the tests depend on.
Evidence path: mkdir -p docs/testing/evidence succeeds and the directory exists.

If any gate fails, stop, fix setup or flags, and do not execute tests.

Step 4: Execute Tests Sequentially

For each test in the plan:

4a. Log Test Start

## Running: TC-XX - <test.name>

Context: <test.context>

4b. Execute Steps

For each step in test.steps, determine the step type and execute accordingly:

Shell commands (run: steps):

The most common step type. Execute the command via Bash and capture stdout, stderr, and exit code:

# Execute the command, capture output and exit code
<command> 2>&1
echo "EXIT_CODE: $?"

Capture all output for evaluation in step 4c. Shell steps cover:

CLI binary invocations (e.g., ./target/debug/myapp status --all)
Database queries (e.g., psql "${DATABASE_URL}" -c "SELECT ...")
File inspection (e.g., ls -la /path/to/expected/output)
Process lifecycle checks (e.g., timeout 5 ./myapp 2>&1 || true)
Any other command a human would type in a terminal

curl actions (action: curl steps):

curl -X <method> \
  -H "Content-Type: application/json" \
  <additional headers> \
  -d '<body>' \
  "<url>" \
  -o response.json \
  -w "%{http_code}" > status_code.txt

# Capture response for evaluation
cat response.json
cat status_code.txt

agent-browser CLI actions:

Steps starting with agent-browser are browser automation commands:

# Navigate
agent-browser open <url>

# Snapshot interactive elements (always do before interacting)
agent-browser snapshot -i

# Interact using refs from snapshot output (@e1, @e2, etc.)
agent-browser fill @<ref> "<value>"
agent-browser click @<ref>

# Wait for conditions
agent-browser wait --url "<pattern>"
agent-browser wait --text "<text>"
agent-browser wait --load networkidle

# Capture evidence
agent-browser screenshot docs/testing/evidence/<test.id>.png

Important: Always run agent-browser snapshot -i before interacting with elements to get valid refs, and re-snapshot after navigation or significant DOM changes.

Save screenshots to docs/testing/evidence/<test.id>.png

4c. Evaluate Result

Gate — artifacts before PASS/FAIL:

run: steps: Stdout and stderr captured; exit code recorded (e.g. EXIT_CODE: line or equivalent).
action: curl steps: Response body and HTTP status captured to known paths (e.g. response.json, status_code.txt or paths the plan specifies).
agent-browser steps: After any navigation or DOM change, a fresh agent-browser snapshot -i exists before asserting; if the test records evidence, the screenshot file path is created or failure is explicit.

Then, using agent reasoning, compare actual outcome against test.expected:

Read the expected behavior description
Compare with actual response/screenshot
Determine PASS or FAIL

4d. On PASS

✓ TC-XX PASSED: <test.name>

Continue to next test.

4e. On FAIL

Stop immediately. Go to Step 6.

Step 5: On All Tests Pass

## Test Results: ALL PASSED

| ID | Name | Result |
|----|------|--------|
| TC-01 | <name> | ✓ PASS |
| TC-02 | <name> | ✓ PASS |
| ... | ... | ... |

**Total:** N/N tests passed

### Evidence

Screenshots saved to `docs/testing/evidence/`

### Cleanup

Stopping background services...

Clean up:

# Kill background services
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
  if [ -f "$pidfile" ]; then
    kill $(cat "$pidfile") 2>/dev/null
    rm "$pidfile"
  fi
done

Step 6: On Failure - Generate Debug Prompt

When a test fails, generate rich debug output:

6a. Gather Context

# Get changed files relevant to the failure
git diff --name-only $(git merge-base HEAD origin/main)..HEAD

# Get recent changes in files mentioned in test.context
git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>

6b. Output Debug Report

## Test Failure: TC-XX - <test.name>

### What Failed

**Test:** <test.name>
**Expected:**
<test.expected>

**Actual:**
<Describe what actually happened - response code, error message, screenshot description>

### Relevant Changes in This PR

<For each file mentioned in test.context or related to the failure:>
- `<file>` (lines X-Y) - <brief description of changes>

### Evidence

<If screenshot exists:>
- Screenshot: `docs/testing/evidence/<test.id>.png`

<If API response:>
- Status code: <code>
- Response body:
```json
<response>

Error Details

<If error message in response or logs:> ``` <error message> ```

Suggested Investigation

<First thing to check based on error type>
<Second thing related to changed files>
<Third thing about environment/setup>

Debug Session Prompt

Copy this to start a new Claude session:

I'm debugging a test failure in branch <branch>.

Test: <test.name> Error: <brief error description>

Relevant files: <List changed files related to this test>

Help me investigate why <specific failure reason>.


### 6c. Preserve Evidence

```bash
# Ensure evidence directory exists
mkdir -p docs/testing/evidence

# Save failure context
cat > docs/testing/evidence/<test.id>-failure.md << 'EOF'
# Failure Report: <test.id>

<Full debug report content>
EOF

6d. Cleanup and Exit

# Kill background services
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
  if [ -f "$pidfile" ]; then
    kill $(cat "$pidfile") 2>/dev/null
    rm "$pidfile"
  fi
done

Test Results Summary Table

Always output a summary table showing progress:

## Test Results

| ID | Name | Result |
|----|------|--------|
| TC-01 | <name> | ✓ PASS |
| TC-02 | <name> | ✗ FAIL |
| TC-03 | <name> | - SKIP |

**Passed:** 1/3
**Failed:** TC-02

Tests after a failure are marked as SKIP (not executed).

Verification

Before completing:

# Verify evidence directory exists
ls -la docs/testing/evidence/

# List captured evidence
ls docs/testing/evidence/*.png docs/testing/evidence/*.md 2>/dev/null

Pass conditions (all must be true to call the run complete):

ls -la docs/testing/evidence/ exits 0 (directory exists).
Every test that ran has an explicit PASS or FAIL (not only “felt right”).
If any test failed: a failure artifact exists (docs/testing/evidence/<test.id>-failure.md or equivalent) or the debug report was emitted with expected vs actual.
Cleanup ran: no stale .beagle/service-*.pid pointing at live processes you started, unless the plan says to leave them up.

Verification Checklist:

Setup commands executed successfully (or --skip-setup with live dependencies confirmed)
Health checks passed before test execution (Step 3 gate)
Each executed test has recorded result
Evidence captured in docs/testing/evidence/
On failure: debug prompt includes expected vs actual
On failure: relevant PR changes listed
Background processes cleaned up

Gates (ordered)

Plan valid: YAML parses; required keys for your branch (setup / tests) are present.
Setup healthy: Step 3 pass conditions met before TC-01.
Per-test artifacts: Step 4c gate satisfied before marking PASS.
Stop on fail: First failure → Step 6; later tests = SKIP in the summary table.
Cleanup: Step 5 or Step 6d executed so background PIDs from this run are released.

Rules

Stop on first test failure (do not continue to other tests)
Always capture evidence (screenshots, responses)
Include file:line references in debug prompts when possible
Use --skip-setup flag to re-run after fixing issues
Never hardcode secrets - use environment variables
Clean up background processes even on failure
Preserve failure evidence for debugging
Make debug prompts copy-paste ready for new sessions

Comments

Loading comments...