Install
openclaw skills install @lanyasheng/auto-improvement-orchestrator当需要一键跑完「生成→评分→评估→执行→门禁」全流程、失败后自动重试、或批量改进多个 skill 时使用。不用于单独评估 skill 质量(用 improvement-learner)或手动打分(用 improvement-discriminator)。
openclaw skills install @lanyasheng/auto-improvement-orchestratorCoordinates the full improvement pipeline: Generator → Discriminator → Evaluator → Executor → Gate.
improvement-learnerimprovement-discriminatorimprovement-executorbenchmark-storepropose → discriminate → evaluate* → execute → gate (7-layer)
↻ Ralph Wiggum: fail → inject trace → retry (max N)
* evaluate skipped if: no --task-suite, OR low-risk docs/reference/guardrail (adaptive complexity)
Adaptive Complexity Skip: candidates with risk_level=low AND category in (docs, reference, guardrail) skip the evaluator stage entirely. Other categories always run evaluator when --task-suite is provided.
Evaluator→Gate Forwarding: if evaluator produces an artifact, its path is forwarded to gate via --evaluation, enabling RegressionGate to check evaluator verdict.
Baseline Evaluation: when --task-suite is given, orchestrator first runs evaluator in --standalone mode on the current SKILL.md to discover which tasks fail, then injects those failures as --source feedback to the generator.
python3 scripts/orchestrate.py \
--target /path/to/skill \ # REQUIRED: skill directory or file to improve
--state-root /path/to/state \ # REQUIRED: where artifacts are written
--source feedback.json \ # repeatable: memory/feedback/trace files
--max-retries 3 \ # default 3: Ralph Wiggum retry attempts
--task-suite tasks.yaml \ # enables evaluator stage (real LLM eval)
--eval-mock # evaluator uses mock execution, no claude CLI
| Param | Default | When to change |
|---|---|---|
--target | (required) | Always set — path to the skill dir to improve |
--state-root | (required) | Always set — persistent state/artifact directory |
--source | [] | Add feedback.json, memory files, or prior failure traces |
--max-retries | 3 | Raise to 5 for hard-to-improve skills; lower to 1 for fast iteration |
--task-suite | None | Provide to enable LLM-based evaluator; omit for docs-only changes |
--eval-mock | false | Use in CI/testing to skip real claude -p calls |
revert 时自动调用 extract_failure_trace() 写入 traces/trace-{run_id}.json{state-root}/pipeline-summary.json最终输出 pipeline-summary.json:
{"target": "/path/to/skill", "attempts": 2, "max_retries": 3,
"final_decision": "keep", "final_candidate_id": "cand-01-docs",
"final_artifact_path": "/state/receipts/gate-run001-cand-01.json"}
final_decision 取值: keep | revert | reject | pending_promote | no_candidates | no_accepted_candidates
propose.pyscore.py--task-suite provided; baseline failures injected as --sourceexecute.py--evaluation artifact when evaluator ran