Evalpal

Other

Run AI agent evaluations via EvalPal — trigger eval runs, check results, and list available evaluations

Install

openclaw skills install evalpal

EvalPal Skill

Run AI agent evaluations inline. Trigger eval runs, poll for results, and list available evaluation definitions — all from chat.

Prerequisites

Set the following environment variables in your OpenClaw skill configuration:

VariableRequiredDescription
EVALPAL_API_KEYYesYour EvalPal API key (starts with sk_)
EVALPAL_API_URLNoBase URL (defaults to https://evalpal.dev)

Get your API key from Settings → API Keys at evalpal.dev.

Commands

/evalpal run --eval-id <ID>

Trigger an evaluation run and wait for results.

Usage:

bash scripts/run-eval.sh --eval-id <EVAL_DEFINITION_ID>

What it does:

  1. Triggers a new eval run via the EvalPal API
  2. Polls for completion with exponential backoff (up to 5 minutes)
  3. Fetches and formats results as readable markdown

Example output:

✅ Episode Quality — PASSED (15/16)
├── Test Case tc_001: ✓ PASS
├── Test Case tc_002: ✓ PASS
├── Test Case tc_003: ✗ FAIL
└── 12 more passed...

Run ID: run_abc123 · 16 test cases · 47s

Exit codes: 0 = all passed, 1 = failures or error.

/evalpal status --run-id <ID>

Check the current status of a running evaluation.

Usage:

bash scripts/check-status.sh --run-id <RUN_ID>

Example output:

📊 Run Status: run_abc123
Status: running
Started: 2026-03-26T20:00:00Z

/evalpal list

List available evaluation definitions across your projects.

Usage:

bash scripts/list-evals.sh [--project-id <PROJECT_ID>]

If --project-id is omitted, lists evals for all projects.

Example output:

📋 Evaluation Definitions

Project: AI Workforce Lab
  abc123  Episode Quality Check
  def456  Factual Accuracy Eval

Project: Customer Support Bot
  ghi789  Response Quality

Error Handling

All scripts handle common error cases:

ScenarioOutputExit Code
No API key setError: EVALPAL_API_KEY is not set1
Invalid API keyError: Authentication failed (401)1
Eval not foundError: Eval definition not found (404)1
Rate limitedError: Rate limited — retry after Xs (429)1
Timeout (5 min)Error: Evaluation timed out after 300s1
Network errorError: Could not reach EvalPal API1

Security

  • The API key is read from EVALPAL_API_KEY environment variable only
  • Scripts never echo or log the API key
  • All API calls use HTTPS