Install
openclaw skills install ml-model-eval-benchmarkCompare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.
openclaw skills install ml-model-eval-benchmarkProduce consistent model ranking outputs from metric-weighted evaluation inputs.
scripts/benchmark_models.py to generate benchmark outputs.references/benchmarking-guide.md for weighting and tie-break guidance.