Install
openclaw skills install aa-benchmarking-frameworkComposite scoring and efficiency frontier analysis for LLM evaluation — combines multiple quality dimensions (accuracy, latency, cost, consistency) into a single Pareto-optimal ranking. Use when comparing models or agent configurations across competing objectives, building evaluation dashboards, or identifying the efficiency frontier for model selection. Implements weighted composite scores, Pareto frontier detection, and radar chart visualisation for multi-dimensional LLM benchmarking.
openclaw skills install aa-benchmarking-frameworkLast used: 2026-03-24 Memory references: 1 Status: Active
STATUS: DRAFT — This skill is planned but not yet fully implemented.
Provides a systematic framework for multi-dimensional LLM evaluation using composite scoring, efficiency frontier analysis, and Pareto optimality. Rather than ranking models on a single metric, it helps identify which models are non-dominated — i.e., no other model is better on all dimensions simultaneously. Designed for teams that need principled model selection beyond simple leaderboard rankings.