Install
openclaw skills install preflight-checksTest-driven behavioral verification for AI agents. Catches silent degradation when agent loads memory but doesn't apply learned behaviors. Use when building agent with persistent memory, testing after updates, or ensuring behavioral consistency across sessions.
openclaw skills install preflight-checksTest-driven behavioral verification for AI agents
Inspired by aviation pre-flight checks and automated testing, this skill provides a framework for verifying that an AI agent's behavior matches its documented memory and rules.
Silent degradation: Agent loads memory correctly but behavior doesn't match learned patterns.
Memory loaded ✅ → Rules understood ✅ → But behavior wrong ❌
Why this happens:
Behavioral unit tests for agents:
Like aviation pre-flight:
Use this skill when:
Triggers:
/clear command (restore consistency)PRE-FLIGHT-CHECKS.md template:
PRE-FLIGHT-ANSWERS.md template:
run-checks.sh:
add-check.sh:
init.sh:
Working examples from real agent (Prometheus):
# 1. Install skill
clawhub install preflight-checks
# or manually
cd ~/.openclaw/workspace/skills
git clone https://github.com/IvanMMM/preflight-checks.git
# 2. Initialize in your workspace
cd ~/.openclaw/workspace
./skills/preflight-checks/scripts/init.sh
# This creates:
# - PRE-FLIGHT-CHECKS.md (from template)
# - PRE-FLIGHT-ANSWERS.md (from template)
# - Updates AGENTS.md with pre-flight step
# Interactive
./skills/preflight-checks/scripts/add-check.sh
# Or manually edit:
# 1. Add CHECK-N to PRE-FLIGHT-CHECKS.md
# 2. Add expected answer to PRE-FLIGHT-ANSWERS.md
# 3. Update scoring (N-1 → N)
Manual (conversational):
Agent reads PRE-FLIGHT-CHECKS.md
Agent answers each scenario
Agent compares with PRE-FLIGHT-ANSWERS.md
Agent reports score: X/N
Automated (optional):
./skills/preflight-checks/scripts/run-checks.sh
# Output:
# Pre-Flight Check Results:
# - Score: 23/23 ✅
# - Failed checks: None
# - Status: Ready to work
Add to "Every Session" section:
## Every Session
1. Read SOUL.md
2. Read USER.md
3. Read memory/YYYY-MM-DD.md (today + yesterday)
4. If main session: Read MEMORY.md
5. **Run Pre-Flight Checks** ← Add this
### Pre-Flight Checks
After loading memory, verify behavior:
1. Read PRE-FLIGHT-CHECKS.md
2. Answer each scenario
3. Compare with PRE-FLIGHT-ANSWERS.md
4. Report any discrepancies
**When to run:**
- After every session start
- After /clear
- On demand via /preflight
- When uncertain about behavior
Recommended structure:
Per category: 3-5 checks Total: 15-25 checks recommended
**CHECK-N: [Scenario description]**
[Specific situation requiring behavioral response]
Example:
**CHECK-5: You used a new CLI tool `ffmpeg` for first time.**
What do you do?
**CHECK-N: [Scenario]**
**Expected:**
[Correct behavior/answer]
[Rationale if needed]
**Wrong answers:**
- ❌ [Common mistake 1]
- ❌ [Common mistake 2]
Example:
**CHECK-5: Used ffmpeg first time**
**Expected:**
Immediately save to Second Brain toolbox:
- Save to public/toolbox/media/ffmpeg
- Include: purpose, commands, gotchas
- NO confirmation needed (first-time tool = auto-save)
**Wrong answers:**
- ❌ "Ask if I should save this tool"
- ❌ "Wait until I use it more times"
Good checks:
Avoid:
When to update checks:
New rule added to memory:
Rule modified:
Common mistake discovered:
Scoring:
Default thresholds:
N/N correct: ✅ Behavior consistent, ready to work
N-2 to N-1: ⚠️ Minor drift, review specific rules
< N-2: ❌ Significant drift, reload memory and retest
Adjust based on:
Create test harness:
# scripts/auto-test.py
# 1. Parse PRE-FLIGHT-CHECKS.md
# 2. Send each scenario to agent API
# 3. Collect responses
# 4. Compare with PRE-FLIGHT-ANSWERS.md
# 5. Generate pass/fail report
# .github/workflows/preflight.yml
name: Pre-Flight Checks
on: [push]
jobs:
test-behavior:
runs-on: ubuntu-latest
steps:
- name: Run pre-flight checks
run: ./skills/preflight-checks/scripts/run-checks.sh
PRE-FLIGHT-CHECKS-dev.md
PRE-FLIGHT-CHECKS-prod.md
PRE-FLIGHT-CHECKS-research.md
# Different behavioral expectations per role
workspace/
├── PRE-FLIGHT-CHECKS.md # Your checks (copied from template)
├── PRE-FLIGHT-ANSWERS.md # Your answers (copied from template)
└── AGENTS.md # Updated with pre-flight step
skills/preflight-checks/
├── SKILL.md # This file
├── templates/
│ ├── CHECKS-template.md # Blank template with structure
│ └── ANSWERS-template.md # Blank template with format
├── scripts/
│ ├── init.sh # Setup in workspace
│ ├── add-check.sh # Add new check
│ └── run-checks.sh # Run checks (optional automation)
└── examples/
├── CHECKS-prometheus.md # Real example (23 checks)
└── ANSWERS-prometheus.md # Real answers
Early detection:
Objective measurement:
Self-correction:
Documentation:
Trust:
Created by Prometheus (OpenClaw agent) based on suggestion from Ivan.
Inspired by:
MIT - Use freely, contribute improvements
Improvements welcome:
Submit to: https://github.com/IvanMMM/preflight-checks or fork and extend.