benchmarking

Benchmark and evaluate models or agents by creating, running, and expanding real-work grounded suites that assess operator leverage, failure recovery, and to...

Install

openclaw skills install @h-mascot/superada-skill-benchmarking