Verified Capability Evolver

Workflows

Safely improve agent capabilities with structured verification, rollback, and promotion gating. Enhances existing evolution workflows with optional SettlementWitness verification.

Install

openclaw skills install verified-capability-evolver

Verified Capability Evolver

Extend existing capability evolution workflows with structured verification, rollback, and promotion gating.

This skill does not replace the underlying self-improvement system. It preserves the original learning, hook, and extraction workflow while adding a verification layer so permanent behavior changes are only promoted when they are proven.

Data handling and trust

This skill defines a verification workflow, not automatic data transmission.

Only structured task data (spec + output) should be used for verification
Do NOT include secrets, API keys, credentials, private keys, seed phrases, or personal data
SettlementWitness integration is runtime-controlled and should be used only with explicit user approval
Verification is applied to selected learnings and promotion events, not all agent activity

Core Principle

Example Usage

Scenario

An agent repeatedly generates invalid JSON for an API response. A fix is applied, but before promoting this as permanent behavior, the improvement must be verified.

Learning Entry (Before Resolution)

[LRN-20260329-001] correction

Status: pending

Summary

Agent produced invalid JSON format for API responses

Suggested Action

Ensure all outputs conform to required schema before returning

Verification Spec

{ "expected": { "valid_json": true, "matches_schema": true } }

Verification Result

{ "valid_json": true, "matches_schema": true }

Outcome

Verification returns PASS
Status → resolved
Learning becomes eligible for promotion
Receipt metadata can be attached (if verification used)

What This Shows

Improvements must be proven, not assumed
Promotion is gated by verification
Failed fixes do not become permanent behavior

An agent should not just improve — it should prove that it improved.

The highest-stakes moment in self-improvement is promotion to permanent memory. A temporary fix becomes permanent behavior only after it passes verification.

Agent Identity (Required for TrustScore)

When using SettlementWitness verification, provide a stable agent_id:

{wallet_address}:capability-evolver

Use the format {wallet_address}:capability-evolver so TrustScore history can compound correctly across sessions.

Quick Reference

Situation	Action
Command/operation fails	Log to `.learnings/ERRORS.md`
User corrects you	Log to `.learnings/LEARNINGS.md` with category `correction`
User wants missing feature	Log to `.learnings/FEATURE_REQUESTS.md`
API/external tool fails	Log to `.learnings/ERRORS.md` with integration details
Knowledge was outdated	Log to `.learnings/LEARNINGS.md` with category `knowledge_gap`
Found better approach	Log to `.learnings/LEARNINGS.md` with category `best_practice`
Learning is marked `resolved`	Define verification spec before promotion
Promotion to permanent memory is being considered	Verify first
Verification returns PASS	Promote and attach `receipt_id`
Verification returns FAIL	Roll back and log counter-evidence
Verification returns INDETERMINATE	Hold for review, do not promote
Simplify/Harden recurring patterns	Log/update `.learnings/LEARNINGS.md` with `Source: simplify-and-harden` and a stable `Pattern-Key`
Similar to existing entry	Link with `See Also`, consider priority bump
Workflow improvements	Promote to `AGENTS.md` (OpenClaw workspace) after verification PASS
Tool gotchas	Promote to `TOOLS.md` (OpenClaw workspace) after verification PASS
Behavioral patterns	Promote to `SOUL.md` (OpenClaw workspace) after verification PASS

OpenClaw Setup (Recommended)

OpenClaw is the primary platform for this skill. It uses workspace-based prompt injection with automatic skill loading.

Installation

Via ClawdHub (recommended):

clawdhub install verified-capability-evolver

Manual:

git clone https://github.com/your-org/verified-capability-evolver.git ~/.openclaw/skills/verified-capability-evolver

Workspace Structure

OpenClaw injects these files into every session:

~/.openclaw/workspace/
├── AGENTS.md          # Multi-agent workflows, delegation patterns
├── SOUL.md            # Behavioral guidelines, personality, principles
├── TOOLS.md           # Tool capabilities, integration gotchas
├── MEMORY.md          # Long-term memory (main session only)
├── memory/            # Daily memory files
│   └── YYYY-MM-DD.md
└── .learnings/        # This skill's log files
    ├── LEARNINGS.md
    ├── ERRORS.md
    └── FEATURE_REQUESTS.md

Create Learning Files

mkdir -p ~/.openclaw/workspace/.learnings

Then create the log files (or copy from assets/):

LEARNINGS.md — corrections, knowledge gaps, best practices
ERRORS.md — command failures, exceptions
FEATURE_REQUESTS.md — user-requested capabilities

Promotion Targets

When learnings prove broadly applicable, promote them to workspace files:

Learning Type	Promote To	Example
Behavioral patterns	`SOUL.md`	"Be concise, avoid disclaimers"
Workflow improvements	`AGENTS.md`	"Spawn sub-agents for long tasks"
Tool gotchas	`TOOLS.md`	"Git push needs auth configured first"

Inter-Session Communication

OpenClaw provides tools to share learnings across sessions:

sessions_list — View active/recent sessions
sessions_history — Read another session's transcript
sessions_send — Send a learning to another session
sessions_spawn — Spawn a sub-agent for background work

Optional: Enable Hook

For automatic reminders at session start:

# Copy hook to OpenClaw hooks directory
cp -r hooks/openclaw ~/.openclaw/hooks/verified-capability-evolver

# Enable it
openclaw hooks enable verified-capability-evolver

See references/openclaw-integration.md for complete details.

Generic Setup (Other Agents)

For Claude Code, Codex, Copilot, or other agents, create .learnings/ in your project:

mkdir -p .learnings

Copy templates from assets/ or create files with headers.

Add reference to agent files AGENTS.md, CLAUDE.md, or .github/copilot-instructions.md to remind yourself to log learnings. (this is an alternative to hook-based reminders)

Self-Improvement Workflow

When errors or corrections occur:

Log to .learnings/ERRORS.md, LEARNINGS.md, or FEATURE_REQUESTS.md
Review and promote broadly applicable learnings to:
- CLAUDE.md - project facts and conventions
- AGENTS.md - workflows and automation
- .github/copilot-instructions.md - Copilot context

Logging Format

Learning Entry

Append to .learnings/LEARNINGS.md:

## [LRN-YYYYMMDD-XXX] category

**Logged**: ISO-8601 timestamp
**Priority**: low | medium | high | critical
**Status**: pending
**Area**: frontend | backend | infra | tests | docs | config

### Summary
One-line description of what was learned

### Details
Full context: what happened, what was wrong, what's correct

### Suggested Action
Specific fix or improvement to make

### Metadata
- Source: conversation | error | user_feedback | simplify-and-harden
- Related Files: path/to/file.ext
- Tags: tag1, tag2
- See Also: LRN-20250110-001 (if related to existing entry)
- Pattern-Key: simplify.dead_code | harden.input_validation (optional, for recurring-pattern tracking)
- Recurrence-Count: 1 (optional)
- First-Seen: 2025-01-15 (optional)
- Last-Seen: 2025-01-15 (optional)

---

Error Entry

Append to .learnings/ERRORS.md:

## [ERR-YYYYMMDD-XXX] skill_or_command_name

**Logged**: ISO-8601 timestamp
**Priority**: high
**Status**: pending
**Area**: frontend | backend | infra | tests | docs | config

### Summary
Brief description of what failed

### Error

Actual error message or output


### Context
- Command/operation attempted
- Input or parameters used
- Environment details if relevant

### Suggested Fix
If identifiable, what might resolve this

### Metadata
- Reproducible: yes | no | unknown
- Related Files: path/to/file.ext
- See Also: ERR-20250110-001 (if recurring)

---

Feature Request Entry

Append to .learnings/FEATURE_REQUESTS.md:

## [FEAT-YYYYMMDD-XXX] capability_name

**Logged**: ISO-8601 timestamp
**Priority**: medium
**Status**: pending
**Area**: frontend | backend | infra | tests | docs | config

### Requested Capability
What the user wanted to do

### User Context
Why they needed it, what problem they're solving

### Complexity Estimate
simple | medium | complex

### Suggested Implementation
How this could be built, what it might extend

### Metadata
- Frequency: first_time | recurring
- Related Features: existing_feature_name

---

ID Generation

Format: TYPE-YYYYMMDD-XXX

TYPE: LRN (learning), ERR (error), FEAT (feature)
YYYYMMDD: Current date
XXX: Sequential number or random 3 chars (e.g., 001, A7B)

Examples: LRN-20250115-001, ERR-20250115-A3F, FEAT-20250115-002

Resolving Entries

When an issue appears fixed, do not immediately treat it as permanent learning.

Updated Resolution Flow

Change **Status**: pending → **Status**: in_progress
Apply the proposed fix or workflow change
Define a deterministic verification spec:
- What should now succeed?
- What output should be produced?
- What failure should no longer occur?
Execute a verification task using that spec
If external verification is being used, obtain explicit approval before submitting minimal structured task data
Verify the result using SettlementWitness or an equivalent deterministic verifier
Interpret the result:

PASS

Change **Status** → resolved
Record verification metadata
Eligible for promotion

FAIL

Revert the change
Keep or return **Status** to pending
Log counter-evidence in the entry
Do not promote

INDETERMINATE

Mark for review
Do not promote until clarified

Resolution Block

Add after Metadata:

### Resolution
- **Resolved**: 2026-03-25T09:00:00Z
- **Verification-Spec**: Output must match schema exactly and contain no hallucinated fields
- **Settlement Verdict**: PASS | FAIL | INDETERMINATE
- **Receipt ID**: sha256:...
- **Notes**: Brief description of what was done

Other status values:

in_progress - Actively being worked on
wont_fix - Decided not to address (add reason in Resolution notes)
promoted - Elevated to CLAUDE.md, AGENTS.md, SOUL.md, TOOLS.md, or .github/copilot-instructions.md after PASS only

Promoting to Project Memory

When a learning is broadly applicable (not a one-off fix), promote it to permanent project memory.

When to Promote

Learning applies across multiple files/features
Knowledge any contributor (human or AI) should know
Prevents recurring mistakes
Documents project-specific conventions

Promotion Targets

Target	What Belongs There
`CLAUDE.md`	Project facts, conventions, gotchas for all Claude interactions
`AGENTS.md`	Agent-specific workflows, tool usage patterns, automation rules
`.github/copilot-instructions.md`	Project context and conventions for GitHub Copilot
`SOUL.md`	Behavioral guidelines, communication style, principles (OpenClaw workspace)
`TOOLS.md`	Tool capabilities, usage patterns, integration gotchas (OpenClaw workspace)

How to Promote

Promotion is the highest-stakes moment in the workflow because it turns a temporary fix into permanent agent behavior. A learning is only promoted to permanent memory if verification returns PASS. All other verdicts (FAIL or INDETERMINATE) trigger rollback and logging. Promotion is strictly gated by verification. No learning may be promoted based on internal confidence, “resolved” status, or heuristic judgment alone.

Distill the learning into a concise rule or fact
Define a verification spec for the claimed improvement
Run a verification task
Promote only on PASS
Add to the appropriate target file (create file if needed)
Attach verification metadata to the original entry:
- Change **Status** → promoted
- Add **Promoted**: CLAUDE.md, AGENTS.md, SOUL.md, TOOLS.md, or .github/copilot-instructions.md
- Add **Verified**: true
- Add **Receipt ID**: sha256:...

If external verification is used:

never send secrets, credentials, or hidden system prompts
only submit minimal structured task data
require explicit approval before submission

Promotion Examples

Learning (verbose):

Project uses pnpm workspaces. Attempted npm install but failed. Lock file is pnpm-lock.yaml. Must use pnpm install.

In CLAUDE.md (concise):

## Build & Dependencies
- Package manager: pnpm (not npm) - use `pnpm install`

Learning (verbose):

When modifying API endpoints, must regenerate TypeScript client. Forgetting this causes type mismatches at runtime.

In AGENTS.md (actionable):

## After API Changes
1. Regenerate client: `pnpm run generate:api`
2. Check for type errors: `pnpm tsc --noEmit`

Rollback Logic (Required)

If a previously promoted learning later fails verification:

Remove or revert the learning from permanent memory
Log the counter-evidence in .learnings/LEARNINGS.md or .learnings/ERRORS.md
Mark the learning as invalid or pending rework
Avoid re-promoting until a new PASS result exists

Rollback is required because unverified permanent memory silently compounds bad behavior.

Recurring Pattern Detection

If logging something similar to an existing entry:

Search first: grep -r "keyword" .learnings/
Link entries: Add **See Also**: ERR-20250110-001 in Metadata
Bump priority if issue keeps recurring
Consider systemic fix: Recurring issues often indicate:
- Missing documentation (→ promote to CLAUDE.md or .github/copilot-instructions.md)
- Missing automation (→ add to AGENTS.md)
- Architectural problem (→ create tech debt ticket)

Simplify & Harden Feed

Use this workflow to ingest recurring patterns from the simplify-and-harden skill and turn them into durable prompt guidance.

Ingestion Workflow

Read simplify_and_harden.learning_loop.candidates from the task summary.
For each candidate, use pattern_key as the stable dedupe key.
Search .learnings/LEARNINGS.md for an existing entry with that key:
- grep -n "Pattern-Key: <pattern_key>" .learnings/LEARNINGS.md
If found:
- Increment Recurrence-Count
- Update Last-Seen
- Add See Also links to related entries/tasks
If not found:
- Create a new LRN-... entry
- Set Source: simplify-and-harden
- Set Pattern-Key, Recurrence-Count: 1, and First-Seen/Last-Seen

Promotion Rule (System Prompt Feedback)

Promote recurring patterns into agent context/system prompt files when all are true:

Recurrence-Count >= 3
Seen across at least 2 distinct tasks
Occurred within a 30-day window

Promotion targets:

CLAUDE.md
AGENTS.md
.github/copilot-instructions.md
SOUL.md / TOOLS.md for OpenClaw workspace-level guidance when applicable

Write promoted rules as short prevention rules (what to do before/while coding), not long incident write-ups.

SettlementWitness Verification Template

Use this shape when verifying a proposed improvement:

{
  "task_id": "improvement-fix-json-output-001",
  "agent_id": "0x123:capability-evolver",
  "spec": {
    "expected": {
      "schema_valid": true,
      "hallucinated_fields": false
    }
  },
  "output": {
    "schema_valid": true,
    "hallucinated_fields": false
  }
}

Interpretation:

PASS → eligible for promotion
FAIL → rollback
INDETERMINATE → hold for review

Periodic Review

Review .learnings/ at natural breakpoints:

When to Review

Before starting a new major task
After completing a feature
When working in an area with past learnings
Weekly during active development

Quick Status Check

# Count pending items
grep -h "Status\*\*: pending" .learnings/*.md | wc -l

# List pending high-priority items
grep -B5 "Priority\*\*: high" .learnings/*.md | grep "^## \["

# Find learnings for a specific area
grep -l "Area\*\*: backend" .learnings/*.md

Review Actions

Resolve fixed items
Promote applicable learnings
Link related entries
Escalate recurring issues

Detection Triggers

Automatically log when you notice:

Corrections (→ learning with correction category):

"No, that's not right..."
"Actually, it should be..."
"You're wrong about..."
"That's outdated..."

Feature Requests (→ feature request):

"Can you also..."
"I wish you could..."
"Is there a way to..."
"Why can't you..."

Knowledge Gaps (→ learning with knowledge_gap category):

User provides information you didn't know
Documentation you referenced is outdated
API behavior differs from your understanding

Errors (→ error entry):

Command returns non-zero exit code
Exception or stack trace
Unexpected output or behavior
Timeout or connection failure

Priority Guidelines

Priority	When to Use
`critical`	Blocks core functionality, data loss risk, security issue
`high`	Significant impact, affects common workflows, recurring issue
`medium`	Moderate impact, workaround exists
`low`	Minor inconvenience, edge case, nice-to-have

Area Tags

Use to filter learnings by codebase region:

Area	Scope
`frontend`	UI, components, client-side code
`backend`	API, services, server-side code
`infra`	CI/CD, deployment, Docker, cloud
`tests`	Test files, testing utilities, coverage
`docs`	Documentation, comments, READMEs
`config`	Configuration files, environment, settings

Best Practices

Log immediately - context is freshest right after the issue
Be specific - future agents need to understand quickly
Include reproduction steps - especially for errors
Link related files - makes fixes easier
Suggest concrete fixes - not just "investigate"
Use consistent categories - enables filtering
Promote only after PASS - permanent memory should be gated by verification, not confidence
Review regularly - stale learnings lose value

Gitignore Options

Keep learnings local (per-developer):

.learnings/

Track learnings in repo (team-wide): Don't add to .gitignore - learnings become shared knowledge.

Hybrid (track templates, ignore entries):

.learnings/*.md
!.learnings/.gitkeep

Hook Integration

Enable automatic reminders through agent hooks. This is opt-in - you must explicitly configure hooks.

Quick Setup (Claude Code / Codex)

Create .claude/settings.json in your project:

{
  "hooks": {
    "UserPromptSubmit": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "./skills/verified-capability-evolver/scripts/activator.sh"
      }]
    }]
  }
}

This injects a learning evaluation reminder after each prompt (~50-100 tokens overhead).

Full Setup (With Error Detection)

{
  "hooks": {
    "UserPromptSubmit": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "./skills/verified-capability-evolver/scripts/activator.sh"
      }]
    }],
    "PostToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "./skills/verified-capability-evolver/scripts/error-detector.sh"
      }]
    }]
  }
}

Available Hook Scripts

Script	Hook Type	Purpose
`scripts/activator.sh`	UserPromptSubmit	Reminds to evaluate learnings after tasks and verify before promotion
`scripts/error-detector.sh`	PostToolUse (Bash)	Triggers on command errors
`scripts/extract-skill.sh`	manual helper	Extracts reusable skills from learnings

See references/hooks-setup.md for detailed configuration and troubleshooting.

Automatic Skill Extraction

When a learning is valuable enough to become a reusable skill, extract it using the provided helper.

Skill Extraction Criteria

A learning qualifies for skill extraction when ANY of these apply:

Criterion	Description
Recurring	Has `See Also` links to 2+ similar issues
Verified	Status is `resolved` with working fix
Non-obvious	Required actual debugging/investigation to discover
Broadly applicable	Not project-specific; useful across codebases
User-flagged	User says "save this as a skill" or similar

Extraction Workflow

Identify candidate: Learning meets extraction criteria

Run helper (or create manually):

./skills/verified-capability-evolver/scripts/extract-skill.sh skill-name --dry-run
./skills/verified-capability-evolver/scripts/extract-skill.sh skill-name

Customize SKILL.md: Fill in template with learning content
Update learning: Set status to promoted_to_skill, add Skill-Path
Verify: Read skill in fresh session to ensure it's self-contained

Manual Extraction

If you prefer manual creation:

Create skills/<skill-name>/SKILL.md
Use template from assets/SKILL-TEMPLATE.md
Follow the agent skills spec:
- YAML frontmatter with name and description
- Name must match folder name
- No README.md inside skill folder

Multi-Agent Support

This skill works across different AI coding agents with agent-specific activation.

Claude Code

Activation: Hooks (UserPromptSubmit, PostToolUse) Setup: .claude/settings.json with hook configuration Detection: Automatic via hook scripts

Codex CLI

Activation: Hooks (same pattern as Claude Code) Setup: .codex/settings.json with hook configuration Detection: Automatic via hook scripts

GitHub Copilot

Activation: Manual (no hook support) Setup: Add to .github/copilot-instructions.md:

## Verified Capability Evolver

After solving non-obvious issues, consider logging to `.learnings/`:
1. Use the format from this skill
2. Link related entries with See Also
3. Define verification specs before promotion
4. Promote only after verification PASS

Ask in chat: "Should I log this as a learning?"

Detection: Manual review at session end

OpenClaw

Activation: Workspace injection + inter-agent messaging Setup: See "OpenClaw Setup" section above Detection: Via session tools and workspace files

Agent-Agnostic Guidance

Regardless of agent, apply verified evolution when you:

Discover something non-obvious - solution wasn't immediate
Correct yourself - initial approach was wrong
Learn project conventions - discovered undocumented patterns
Hit unexpected errors - especially if diagnosis was difficult
Find better approaches - improved on your original solution

Copilot Chat Integration

For Copilot users, add this to your prompts when relevant:

After completing this task, evaluate if any learnings should be logged to .learnings/ and whether any claimed improvement needs verification before promotion.

Or use quick prompts:

"Log this to learnings"
"Create a skill from this solution"
"Check .learnings/ for related issues"
"Define a verification spec for this fix"

Verified Capability Evolver

Install

Verified Capability Evolver

Data handling and trust

Core Principle

Example Usage

Scenario

Learning Entry (Before Resolution)

[LRN-20260329-001] correction

Summary

Suggested Action

Verification Spec

Verification Result

Outcome

What This Shows

Agent Identity (Required for TrustScore)

Quick Reference

OpenClaw Setup (Recommended)

Installation

Workspace Structure

Create Learning Files

Promotion Targets

Inter-Session Communication

Optional: Enable Hook

Generic Setup (Other Agents)

Add reference to agent files AGENTS.md, CLAUDE.md, or .github/copilot-instructions.md to remind yourself to log learnings. (this is an alternative to hook-based reminders)

Self-Improvement Workflow

Logging Format

Learning Entry

Error Entry

Feature Request Entry

ID Generation

Resolving Entries

Updated Resolution Flow

PASS

FAIL

INDETERMINATE

Resolution Block

Promoting to Project Memory

When to Promote

Promotion Targets

How to Promote

Promotion Examples

Rollback Logic (Required)

Recurring Pattern Detection

Simplify & Harden Feed

Ingestion Workflow

Promotion Rule (System Prompt Feedback)

SettlementWitness Verification Template

Periodic Review

When to Review

Quick Status Check

Review Actions

Detection Triggers

Priority Guidelines

Area Tags

Best Practices

Gitignore Options

Hook Integration

Quick Setup (Claude Code / Codex)

Full Setup (With Error Detection)

Available Hook Scripts

Automatic Skill Extraction

Skill Extraction Criteria

Extraction Workflow

Manual Extraction

Multi-Agent Support

Claude Code

Codex CLI

GitHub Copilot

OpenClaw

Agent-Agnostic Guidance

Copilot Chat Integration

Related skills