Improvement Orchestrator

v1.0.0

当需要一键跑完「生成→评分→评估→执行→门禁」全流程、失败后自动重试、或批量改进多个 skill 时使用。不用于单独评估 skill 质量（用 improvement-learner）或手动打分（用 improvement-discriminator）。

⭐ 0· 94·0 current·0 all-time

by_silhouette@lanyasheng

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lanyasheng/auto-improvement-orchestrator.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Improvement Orchestrator" (lanyasheng/auto-improvement-orchestrator) from ClawHub.
Skill page: https://clawhub.ai/lanyasheng/auto-improvement-orchestrator
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install auto-improvement-orchestrator

ClawHub CLI

Package manager switcher

npx clawhub@latest install auto-improvement-orchestrator

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

The name/description (orchestrating a 5‑stage improvement pipeline) matches the actual behavior: the script dispatches Generator→Discriminator→Evaluator→Executor→Gate and writes state/artifacts and backups. All declared requirements (none) are appropriate for a local orchestrator that runs other local scripts.

ℹ

Instruction Scope

SKILL.md and scripts explicitly instruct the agent to run local subprocesses, read feedback sources, and apply changes to the target skill (append markdown sections, create files) with backups/rollback. This is expected for an orchestrator, but it means the skill will read/write arbitrary files under the provided --target and --state-root and can forward failure traces into subsequent runs. The orchestrator itself does not call external endpoints, but it invokes other scripts (e.g., evaluator) which may call LLM CLIs or network services — review those scripts before use.

✓

Install Mechanism

Instruction-only (no install spec). The bundle includes orchestration code only; nothing is downloaded or extracted from external URLs. Lowest install risk.

ℹ

Credentials

The skill declares no required env vars or credentials, which is coherent. Caveat: the orchestrator spawns other local scripts (generator/discriminator/evaluator/executor/gate) that are expected to live in the repo; those sub-scripts may themselves require API keys or credentials (e.g., for LLMs) even though the orchestrator doesn't declare them. Confirm what the invoked scripts expect before running with real task suites.

✓

Persistence & Privilege

always=false and no special platform privileges. The orchestrator writes persistent artifacts and backups to the user-supplied --state-root (normal for its purpose). It does not modify other skills' configuration beyond running the standard executor workflow for the provided --target, but it will apply changes to the target path (intended behavior).

Assessment

This skill is an on‑repo pipeline orchestrator: it will run local scripts (propose/score/evaluate/execute/gate), create state artifacts and backups, and may apply changes to files under the --target you provide. Before running: 1) Inspect the scripts it invokes in your repository (improvement-generator/discriminator/evaluator/executor/gate) to ensure they do only what you expect; 2) Run first with a disposable --state-root and use --eval-mock (avoid real LLM CLI calls) to observe behavior; 3) Backup your target skill or point --target at a copy if you are not ready for automatic modifications; 4) Be aware that the evaluator or other invoked scripts may require separate API keys or env vars — the orchestrator itself does not request credentials. If you want to be extra cautious, run the included tests and review the executor's logic to confirm it only performs allowed actions (append_markdown_section, create_file) for low‑risk categories.

Like a lobster shell, security has layers — review code before you run it.

latestvk975camepbb5szv1ma8bv9ksas848ckq

94downloads

0stars

1versions

Updated 3w ago

v1.0.0

MIT-0

Improvement Orchestrator

Coordinates the full improvement pipeline: Generator → Discriminator → Evaluator → Executor → Gate.

When to Use

Run a full improvement cycle on one or more skills
Coordinate the 5-stage pipeline end-to-end (with optional evaluator)
Retry failed improvements with trace-aware feedback (Ralph Wiggum loop)

When NOT to Use

只想检查 skill 质量评分 → use improvement-learner
只想手动给候选打分 → use improvement-discriminator
只想改一个文件 → use improvement-executor
只想查基准数据 → use benchmark-store

Pipeline

propose → discriminate → evaluate* → execute → gate (7-layer)
         ↻ Ralph Wiggum: fail → inject trace → retry (max N)
         * evaluate skipped if: no --task-suite, OR low-risk docs/reference/guardrail (adaptive complexity)

Adaptive Complexity Skip: candidates with risk_level=low AND category in (docs, reference, guardrail) skip the evaluator stage entirely. Other categories always run evaluator when --task-suite is provided.

Evaluator→Gate Forwarding: if evaluator produces an artifact, its path is forwarded to gate via --evaluation, enabling RegressionGate to check evaluator verdict.

Baseline Evaluation: when --task-suite is given, orchestrator first runs evaluator in --standalone mode on the current SKILL.md to discover which tasks fail, then injects those failures as --source feedback to the generator.

CLI

python3 scripts/orchestrate.py \
  --target /path/to/skill \        # REQUIRED: skill directory or file to improve
  --state-root /path/to/state \    # REQUIRED: where artifacts are written
  --source feedback.json \         # repeatable: memory/feedback/trace files
  --max-retries 3 \                # default 3: Ralph Wiggum retry attempts
  --task-suite tasks.yaml \        # enables evaluator stage (real LLM eval)
  --eval-mock                      # evaluator uses mock execution, no claude CLI

Param	Default	When to change
`--target`	(required)	Always set — path to the skill dir to improve
`--state-root`	(required)	Always set — persistent state/artifact directory
`--source`	[]	Add feedback.json, memory files, or prior failure traces
`--max-retries`	3	Raise to 5 for hard-to-improve skills; lower to 1 for fast iteration
`--task-suite`	None	Provide to enable LLM-based evaluator; omit for docs-only changes
`--eval-mock`	false	Use in CI/testing to skip real `claude -p` calls

<example> 正确用法: 对一个 skill 运行全流程改进（含 evaluator） $ python3 scripts/orchestrate.py --target /path/to/skill --state-root ./state --task-suite tasks.yaml → 0. Baseline evaluation: 发现 2 个 task 失败，注入 generator → 1. 生成候选 → 2. 多人盲审 → 3. 任务评估 → 4. 执行变更 → 5. 7层门禁 → 失败时自动注入 trace 重试（最多 3 次） → stdout: /path/to/state/pipeline-summary.json </example> <anti-example> 错误用法: 只想看评分却用了 orchestrator $ python3 scripts/orchestrate.py --target /path/to/skill --state-root ./state → 会实际执行变更！应该用 improvement-learner 的 self_improve.py </anti-example>

Error Handling

每个 subprocess 有 1200s 超时，超时抛 RuntimeError
evaluator 失败不中断流程（打印警告继续），但 evaluation_failure_trace 会注入下轮
gate 返回 revert 时自动调用 extract_failure_trace() 写入 traces/trace-{run_id}.json
pipeline-summary.json 最终输出到 {state-root}/pipeline-summary.json

Output

最终输出 pipeline-summary.json：

{"target": "/path/to/skill", "attempts": 2, "max_retries": 3,
 "final_decision": "keep", "final_candidate_id": "cand-01-docs",
 "final_artifact_path": "/state/receipts/gate-run001-cand-01.json"}

Related Skills

improvement-generator: Produces candidate proposals (stage 1) — orchestrator calls propose.py
improvement-discriminator: Multi-reviewer panel scoring (stage 2) — orchestrator calls score.py
improvement-evaluator: Task suite execution validation (stage 3) — called only when --task-suite provided; baseline failures injected as --source
improvement-executor: Applies changes with backup/rollback (stage 4) — orchestrator calls execute.py
improvement-gate: 7-layer quality gate (stage 5) — receives --evaluation artifact when evaluator ran

Comments

Loading comments...