Rotifer Arena

Dev Tools

One-click Gene comparison and evaluation for Rotifer Protocol. Import from ClawHub Skills, local files, or build from scratch — automatically compile, match opponents, run Arena battles, and generate structured Markdown reports. Use when the user mentions "compare", "evaluate", "challenge", "compete", "Arena", "benchmark", "which is better", "Gene comparison", "Skill comparison", "scenario", "reference case".

Install

openclaw skills install @xiaoba-dev/rotifer-arena

Rotifer Arena — Gene Comparison & Evaluation

One Skill covering Gene/Genome/Agent comparison across all scenarios.

Prerequisites

This Skill requires the Rotifer CLI:

npx @rotifer/playground --version

Or use the MCP Server for IDE integration:

{
  "mcpServers": {
    "rotifer": {
      "command": "npx",
      "args": ["@rotifer/mcp-server"]
    }
  }
}

Overview

This Skill wraps Rotifer Protocol's core value — objective, quantifiable capability evaluation — into a one-click workflow. Users don't need to understand Gene, Arena, or F(g) concepts upfront; the Skill introduces them naturally during execution.

Cross-platform: This SKILL.md runs in any AI development environment that supports Skills/Agents.


Workflow

Phase 1: Identify Evaluation Target

Understand user intent through conversation and determine the evaluation mode:

User signalModeAction
"Evaluate the X skill from ClawHub"ClawHub migration evaluationrotifer wrap <name> --from-clawhub <slug>
"Compare my two implementations"Local comparisonConfirm both Gene names, skip to Phase 3
"I have a Skill I want to test"Skill import evaluationrotifer wrap <name> --from-skill <path>
"Help me build a XX scenario"Scenario scaffoldingGuide Gene creation (rotifer init or manual phenotype)

If the user doesn't specify a domain: auto-read from phenotype.json, or guide the user to choose.

Phase 2: Compile & Verify

rotifer compile <gene-name>

Output guidance based on fidelity result:

  • Wrapped: Verification passed, deterministic evaluation mode
  • Hybrid/Native: WASM compilation, real sandbox execution mode (requires NAPI binding)

Phase 3: Automatic Opponent Matching

Priority order:

  1. User-specified: If the user says "compare X and Y", use those directly
  2. Same-domain local search: Highest-ranked Gene from rotifer arena list --domain <domain>
  3. Same-fidelity preferred: If target is Wrapped, prefer Wrapped opponents (avoid cross-fidelity blowouts)
  4. No opponent found: Inform the user, show current cross-domain Arena rankings for reference

Opponent selection requires user confirmation — show candidate F(g) and fidelity.

Phase 4: Arena Submit & Compare

rotifer arena submit <gene-a>
rotifer arena submit <gene-b>
rotifer arena list --domain <domain>

Collect evaluation results for both Genes.

Phase 5: Generate Evaluation Report

Output the full report in the conversation (rendered Markdown). Append at the end: > Reply "save" to write the report to arena-reports/. When the user replies "save", write to <project>/arena-reports/<date>-<gene-a>-vs-<gene-b>.md.

Report format requirements:

  1. Title = conclusion: Use scenario name + both Gene names, not a generic title
  2. Conclusion first: Immediately below the title, a > blockquote with one-sentence summary of winner and key data
  3. Concise comparison table: Only decision-relevant metrics (rank, F(g), V(g), Fidelity, success rate, latency, source), bold the winner
  4. Ranking visualization: Fixed-width ASCII table showing the full domain ranking, mark new entries with
  5. Reproduction commands in a standalone bash block: Pure commands (no comments/output) for easy copy-paste
  6. No internal references: No ADR numbers, plan section numbers, or internal version notes
  7. Minimal metadata: One line at the bottom with date + CLI version + evaluation mode

Report structure (output directly in conversation):

  • Title: # <scenario> Comparison: <Gene A> vs <Gene B>
  • Conclusion blockquote: One sentence — who won, key metric delta, core reason
  • Comparison table: Rank, F(g), V(g), Fidelity, Success rate, Latency score, Source
  • Current ranking: Full domain leaderboard (ASCII table, marks new entries)
  • Analysis: 2–3 paragraphs on fitness gap attribution, security comparison, same-fidelity positioning
  • Upgrade path: Table with path / action / expected improvement / effort
  • Reproduction steps: 4–5 pure CLI commands
  • Next steps: 4 commands with brief descriptions
  • Footer: Generated on YYYY-MM-DD · @rotifer/playground@X.Y.Z · Mode: deterministic estimation

Scenario Examples

Example 1: Evaluate a ClawHub Skill's Competitiveness

User: Evaluate the web-search skill from ClawHub in the Rotifer ecosystem

Skill execution:
1. rotifer wrap clawhub-web-search --from-clawhub web-search -d search
2. rotifer compile clawhub-web-search
3. Auto-discover same-domain opponent: genesis-web-search (Native, F(g)=0.9470)
4. rotifer arena submit clawhub-web-search
5. Generate comparison report

Example 2: Compare Two Custom Genes

User: Compare my particle-brute and particle-spatial — which is better?

Skill execution:
1. Confirm both Genes exist with phenotype.json
2. rotifer arena submit particle-brute
3. rotifer arena submit particle-spatial
4. rotifer arena list --domain sim.particle
5. Generate comparison report

Example 3: Build a Quantitative Scenario

User: Help me build a quantitative strategy comparison scenario

Skill execution:
1. Guide user to define domain (e.g. quant.strategy)
2. Guide creation of two Gene phenotype.json files (Strategy A vs Strategy B)
3. If compilable source exists, compile to WASM
4. rotifer arena submit both Genes
5. Generate scenario comparison report

Prerequisites

  • Project has a rotifer.json (if not, guide rotifer init)
  • CLI is built (npm run build in rotifer-playground)
  • ClawHub imports require network connectivity

Related Skills

SkillRelationship
gene-devRoute here when users need to create a Gene from scratch
gene-migrationRoute here when the report recommends a fidelity upgrade
gene-auditSuggest running when the report shows low security scores

Constraints

  • No automatic Cloud publishing: Comparison evaluation is a local operation; Cloud publishing requires explicit user confirmation
  • Cross-fidelity comparisons need a disclaimer: The baseFitness gap between Wrapped and Native comes from the scoring model, not actual capability differences
  • Reports are Markdown format: Ready for blogs, community sharing, or GitHub Issues