benchmarking

Evaluate and compare models or providers on real-work tasks by creating, running, and expanding benchmarks that assess tool choice, failure recovery, and pro...

Install

openclaw skills install @h-mascot/benchmarking