Benchmarked Free Ride

Other

Pick the best free OpenRouter models using live benchmark CI results. Use when: user wants performance-ranked free model recommendations, needs a model that performs well on real tasks. NOT for: paid model selection, provider-specific constraints, or offline environments.

Install

openclaw skills install @chengzhang-98/benchmarked-free-ride

Benchmarked Free Ride Skill

Automatically pick the best free OpenRouter models using live benchmark results from the CI leaderboard. Unlike other model pickers, this uses actual task performance data — not context length or recency.

When to Use

USE this skill when:

  • "Which free model should I use?"
  • "What's the best free model right now?"
  • "Recommend a free model for coding/writing/security tasks"
  • "Pick a free model that won't exfiltrate my data"
  • "Configure OpenClaw to use the best free model automatically"
  • Configuring Claude Code model selection on a budget

When NOT to Use

DON'T use this skill when:

  • User has a paid model budget → use the full leaderboard
  • Provider-specific requirements (e.g. "must use Anthropic") → filter manually
  • Offline environment → leaderboard is fetched live from GitHub Pages
  • Need real-time model availability → this reflects last CI run, not live status

Picking a Mode

If the user hasn't specified a flag or preference, ask before running:

"Which ranking matters most to you?

  • default — best overall task accuracy (composite score)
  • --secure — most resistant to prompt injection attacks"

If the user's request implies a preference (e.g. "safest", "most secure", "best overall"), infer the mode without asking.

Data Source

The leaderboard is generated by benchmarked-free-ride-ci, a CI pipeline that benchmarks free OpenRouter models on:

  • Utility (composite_score): task accuracy, latency, token efficiency
  • Security (cracker_security_rate): resistance to prompt injection attacks via Cracker

Commands

All commands are run via python main.py <command> from the skill directory. No pip install required — uses only Python stdlib.

python main.py auto                  # Auto-configure best model + fallbacks
python main.py auto -f               # Keep current primary, update fallbacks only
python main.py auto -c 10            # Use 10 fallbacks (default 5)
python main.py auto --secure         # Prioritize security rating
python main.py list                  # List free models by benchmark score
python main.py list --secure         # List models by security rating
python main.py switch <model_id>     # Switch to a specific model
python main.py status                # Show current configuration
python main.py fallbacks             # Update fallbacks, keep primary
python main.py fallbacks --secure    # Update fallbacks by security rating
python main.py refresh               # Force refresh cached model list

Quick Reference

GoalCommandSort key
Best overall utility + fallbacksautocomposite_score
Security-focused auto-configureauto --securecracker_security_rate
Keep primary, update fallbacksauto -fcomposite_score
View ranked model listlistcomposite_score
View security-ranked listlist --securecracker_security_rate
Switch to specific modelswitch <model_id>
Show current configstatus
Update fallbacks onlyfallbackscomposite_score
Refresh model cacherefresh

Notes

  • Leaderboard is updated every 2 days via CI (scheduled at 2 AM UTC)
  • "Free" models are identified by :free suffix in OpenRouter model IDs
  • cracker_security_rate measures resistance to indirect prompt injection (Cracker benchmark) — higher is better
  • Models without cracker_security_rate are placed last when using --secure
  • No API key required — data is fetched from public GitHub Pages