Install
openclaw skills install skills-coachExplore capability boundaries of a target Skill, analyze optimization potential, generate an optimized version using Training-Free GRPO, and compile results into a structured report
openclaw skills install skills-coachSystematically analyze and optimize OpenClaw skills through automated task generation, Training-Free GRPO optimization, real command execution, comprehensive failure analysis, and detailed evaluation reporting.
Previous updates (v2.3.0):
Previous updates (v2.0.0):
| Feature | Training-Free GRPO (v2.0) | Vanilla GRPO (v1.x) |
|---|---|---|
| Parameter Updates | ❌ None | ✅ Gradient-based |
| Advantage Type | Semantic (natural language) | Numerical (scores) |
| Knowledge Storage | External experience library | Model weights |
| Generalization | Excellent (frozen model) | Limited (overfitting risk) |
| Data Requirements | Minimal (dozens of samples) | Large (thousands) |
| Cost | Very low (~$20) | High ($10,000+) |
| Speed | Fast (inference only) | Slow (training required) |
Key settings in config.yaml:
# Optimization Method Selection (NEW v2.0.0)
optimization:
method: "training_free_grpo" # training_free_grpo | vanilla_grpo
# Training-Free GRPO Parameters
training_free_grpo:
group_size: 5 # Number of rollouts per group
num_epochs: 3 # Number of optimization epochs
temperature_learning: 0.7 # Temperature during learning
temperature_eval: 0.3 # Temperature during evaluation
# Experience Library Management
max_experiences: 10 # Max experiences per domain
# Domain-Specific Optimization
markdown_optimization:
enabled: true
focus_areas: [clarity, structure, examples, completeness]
code_optimization:
enabled: true
focus_areas: [bug_fixes, error_handling, performance, code_quality]
# LLM Configuration
llm_model: "claude-sonnet-4-6"
python orchestrator.py <target-skill-path>
Or via Claude:
Use skills-coach on <target-skill-path>
target-skill-path (required): Path to the directory containing the Skill to analyze and optimize. Must contain a valid SKILL.md.This skill orchestrates 6 steps that execute sequentially:
immutability → code-capability → sample-agent → optimize-agent → exec-agent → failure-analyzer → evaluate-agent
CRITICAL IMMUTABILITY RULE:
Do not proceed to the next step until the current one has fully completed and its outputs are verified.
target-skill-path exists and contains a SKILL.md filefrom subskills.run-manager.run_manager import RunManager
manager = RunManager()
run_dir = manager.create_run(target_skill_path, config)
# If versioned runs enabled:
skills-coach-runs/run_YYYY-MM-DD_HH-MM-SS/
├── tasks/{train,test}
├── exec_results/{original,optimized}
├── optimization/
├── code_capabilities.json
├── failure_analysis_{original,optimized}.json
└── {target-skill}-optimized/
cp -r {target-skill} {work-dir}/{target-skill}-optimized
All subsequent modifications will ONLY affect the optimized copy.Analyze scripts to detect their actual capabilities:
cd subskills/code-capability-detector
python code_capability_detector.py <target-skill-path> <work-dir>
This analyzes:
Expected outputs:
code_capabilities.json - Machine-readable capability datacode_capabilities.md - Human-readable reportPurpose: Ensures generated test tasks only use features the scripts actually support.
Verification: Confirm capability files exist before proceeding.
Execute the task generator:
cd subskills/sample-agent
python task_generator.py <target-skill-path> ../..
The script generates:
Expected outputs:
tasks/train/task_001/ through tasks/train/task_012/ (or task_016 with boundary tasks)tasks/test/task_001/ through tasks/test/task_008/ (or task_010 with boundary tasks)task.md, speccheck.md, and workspace/Verification: Confirm all task directories exist before proceeding.
IMPORTANT: This step works on {target-skill}-optimized, NOT the original.
Execute the GRPO optimizer:
cd subskills/optimize-agent
python grpo_optimizer.py <work-dir>/{target-skill}-optimized ../..
The script runs GRPO optimization with:
Expected outputs:
{target-skill-name}-optimized/ directory containing the optimized SKILL.mdoptimization_log.md documenting the GRPO optimization processVerification: Confirm the optimized skill directory and log file exist before proceeding.
Part A: Generate Task Manifest
Execute the executor to generate task manifest:
cd subskills/exec-agent
python executor.py <target-skill-path> ../..
Expected outputs:
task_manifest.json containing all tasks to executePart B: Execute Tasks via Skill Tool
Claude reads the manifest and executes each task using the Skill tool:
import json
manifest = json.load(open('task_manifest.json'))
for task in manifest['tasks']:
# Execute original skill
Use skill at manifest['target_skill_path'] with task['task_content']
Save output to task['original_result_dir']/output/
# Execute optimized skill
Use skill at manifest['optimized_skill_path'] with task['task_content']
Save output to task['optimized_result_dir']/output/
Expected outputs:
exec_results/original/task_001/ through exec_results/original/task_010/exec_results/optimized/task_001/ through exec_results/optimized/task_010/output/ with real skill execution results and run_log.mdVerification: Confirm all result directories exist with real outputs before proceeding.
Analyze failed tasks to identify root causes and suggest fixes:
cd subskills/failure-analyzer
python failure_analyzer.py <work-dir>/exec_results/original <work-dir>
python failure_analyzer.py <work-dir>/exec_results/optimized <work-dir>
This analyzes:
Expected outputs:
failure_analysis_original.json - Machine-readable failure datafailure_analysis_original.md - Human-readable reportfailure_analysis_optimized.json - Optimized version failuresfailure_analysis_optimized.md - Optimized version reportVerification: Confirm failure analysis files exist before proceeding.
Execute the evaluator to analyze results:
cd subskills/evaluate-agent
python evaluator.py <target-skill-path> <work-dir>
This script:
Expected outputs:
results_report.md containing comprehensive evaluation metrics and analysis{target-skill-name}-optimized/Verification: Confirm results_report.md exists.
Read and present the contents of results_report.md to the user, highlighting:
Versioned Runs (Default):
skills-coach-runs/
├── run_2026-04-13_14-30-00/
│ ├── config.yaml # Config used for this run
│ ├── metadata.json # Run metadata (duration, scores, decision)
│ ├── tasks/
│ │ ├── train/ # 12-16 training tasks (depends on boundary probing)
│ │ └── test/ # 8-10 test tasks (depends on boundary probing)
│ ├── optimization/
│ │ ├── iteration_001/
│ │ │ ├── variant_a/
│ │ │ ├── variant_b/
│ │ │ ├── variant_c/
│ │ │ └── variant_d/
│ │ └── iteration_002/
│ ├── exec_results/
│ │ ├── original/ # 10 tasks
│ │ └── optimized/ # 10 tasks
│ ├── optimization_log.md
│ ├── results_report.md
│ └── {target-skill}-optimized/ # If retained
│
├── run_2026-04-13_15-45-00/
│ └── ... (same structure)
│
└── latest -> run_2026-04-13_15-45-00/ # Symlink to latest run
Legacy Flat Structure (if versioned runs disabled):
./
├── tasks/
│ ├── train/ # 12-16 training tasks (depends on boundary probing)
│ └── test/ # 8-10 test tasks (depends on boundary probing)
├── exec_results/
│ ├── original/ # 8-10 tasks
│ └── optimized/ # 8-10 tasks
├── {target-skill}-optimized/ # If retained
├── optimization_log.md
└── results_report.md
Features can be controlled via config.yaml:
# Task generation
task_generation:
num_training_tasks: 16 # 12 for legacy mode
num_test_tasks: 10 # 8 for legacy mode
probe_boundaries: true # Set to false for legacy 20-task mode
boundary_types:
- input_minimal
- input_maximal
- input_invalid
- resource_limits
- failure_modes
- combinations
# GRPO optimization
grpo:
optimization_levels:
- skill_md # Always enabled
- code # Remove to disable code optimization
- config # Remove to disable config optimization
code_mutations:
- add_caching
- add_validation
- add_error_handling
- optimize_algorithm
# Output structure
output:
use_versioned_runs: true # Set to false for legacy flat structure
runs_directory: "skills-coach-runs"
keep_latest_symlink: true
max_runs_to_keep: 10 # Auto-cleanup old runs
save_intermediate_variants: true
save_execution_logs: true
save_metadata: true
# Run comparison
comparison:
enable_comparison_tool: true
auto_compare_with_previous: true
comparison_metrics:
- baseline_score
- final_score
- improvement
- duration
- iterations
Use run-manager CLI for analysis:
# List all runs
python subskills/run-manager/run_manager.py list
# Compare two runs
python subskills/run-manager/run_manager.py compare run_2026-04-13_14-30-00 run_2026-04-13_15-45-00
# Cleanup old runs (keep latest 10)
python subskills/run-manager/run_manager.py cleanup 10
sample-agent cannot parse the target SKILL.md, abort before task generationoptimize-agent fails to improve scores after 10 iterations, proceed with the best variant foundexec-agent encounters runtime errors, log them in run_log.md and continue with remaining tasksevaluate-agent determines the optimized skill performs worse, delete the optimized directory