Install
openclaw skills install hle-benchmark-evolverClawHub Security found sensitive or high-impact capabilities. Review the scan results before using.
Runs HLE-oriented benchmark reward ingestion and curriculum generation for capability-evolver. Use when the user asks to optimize Humanity's Last Exam score, ingest question-level benchmark results, prioritize easy-first queues, or request an immediate benchmark progress result.
openclaw skills install hle-benchmark-evolverThis skill operationalizes HLE score-driven evolution for OpenClaw.
--report=/abs/path/report.json)cais/hle default)capability-evolver benchmark reward state.benchmark_*curriculum_stage:*focus_subject:*focus_modality:*question_focus:*node skills/hle-benchmark-evolver/run_result.js --report=/absolute/path/hle_report.json
Full automatic loop (starts evolution cycle):
node skills/hle-benchmark-evolver/run_pipeline.js --report=/absolute/path/hle_report.json --cycles=1
If your evaluator can be called from shell, let pipeline generate the report each cycle:
node skills/hle-benchmark-evolver/run_pipeline.js \
--report=/absolute/path/hle_report.json \
--eval_cmd="python /path/to/eval_hle.py --out {{report}}" \
--cycles=3 --interval_ms=2000
If no --report is provided, it defaults to:
skills/capability-evolver/assets/gep/hle_report.template.json
Always print JSON with these fields:
benchmark_idrun_idaccuracyrewardtrendcurriculum_stagequeue_sizefocus_subjectsfocus_modalitiesnext_questionsrun_pipeline.js links ingestion, evolve, and solidify into one executable loop.