Rotifer Arena

Dev Tools

One-click Gene comparison and evaluation for Rotifer Protocol. Import from ClawHub Skills, local files, or build from scratch — automatically compile, match opponents, run Arena battles, and generate structured Markdown reports. Use when the user mentions "compare", "evaluate", "challenge", "compete", "Arena", "benchmark", "which is better", "Gene comparison", "Skill comparison", "scenario", "reference case".

Install

openclaw skills install @xiaoba-dev/rotifer-arena

Rotifer Arena — Gene Comparison & Evaluation

One Skill covering Gene/Genome/Agent comparison across all scenarios.

Prerequisites

This Skill requires the Rotifer CLI:

npx @rotifer/playground --version

Or use the MCP Server for IDE integration:

{
  "mcpServers": {
    "rotifer": {
      "command": "npx",
      "args": ["@rotifer/mcp-server"]
    }
  }
}

Overview

This Skill wraps Rotifer Protocol's core value — objective, quantifiable capability evaluation — into a one-click workflow. Users don't need to understand Gene, Arena, or F(g) concepts upfront; the Skill introduces them naturally during execution.

Cross-platform: This SKILL.md runs in any AI development environment that supports Skills/Agents.

Workflow

Phase 1: Identify Evaluation Target

Understand user intent through conversation and determine the evaluation mode:

User signal	Mode	Action
"Evaluate the X skill from ClawHub"	ClawHub migration evaluation	`rotifer wrap <name> --from-clawhub <slug>`
"Compare my two implementations"	Local comparison	Confirm both Gene names, skip to Phase 3
"I have a Skill I want to test"	Skill import evaluation	`rotifer wrap <name> --from-skill <path>`
"Help me build a XX scenario"	Scenario scaffolding	Guide Gene creation (`rotifer init` or manual phenotype)

If the user doesn't specify a domain: auto-read from phenotype.json, or guide the user to choose.

Phase 2: Compile & Verify

rotifer compile <gene-name>

Output guidance based on fidelity result:

Wrapped: Verification passed, deterministic evaluation mode
Hybrid/Native: WASM compilation, real sandbox execution mode (requires NAPI binding)

Phase 3: Automatic Opponent Matching

Priority order:

User-specified: If the user says "compare X and Y", use those directly
Same-domain local search: Highest-ranked Gene from rotifer arena list --domain <domain>
Same-fidelity preferred: If target is Wrapped, prefer Wrapped opponents (avoid cross-fidelity blowouts)
No opponent found: Inform the user, show current cross-domain Arena rankings for reference

Opponent selection requires user confirmation — show candidate F(g) and fidelity.

Phase 4: Arena Submit & Compare

rotifer arena submit <gene-a>
rotifer arena submit <gene-b>
rotifer arena list --domain <domain>

Collect evaluation results for both Genes.

Phase 5: Generate Evaluation Report

Output the full report in the conversation (rendered Markdown). Append at the end: > Reply "save" to write the report to arena-reports/. When the user replies "save", write to <project>/arena-reports/<date>-<gene-a>-vs-<gene-b>.md.

Report format requirements:

Title = conclusion: Use scenario name + both Gene names, not a generic title
Conclusion first: Immediately below the title, a > blockquote with one-sentence summary of winner and key data
Concise comparison table: Only decision-relevant metrics (rank, F(g), V(g), Fidelity, success rate, latency, source), bold the winner
Ranking visualization: Fixed-width ASCII table showing the full domain ranking, mark new entries with ←
Reproduction commands in a standalone bash block: Pure commands (no comments/output) for easy copy-paste
No internal references: No ADR numbers, plan section numbers, or internal version notes
Minimal metadata: One line at the bottom with date + CLI version + evaluation mode

Report structure (output directly in conversation):

Title: # <scenario> Comparison: <Gene A> vs <Gene B>
Conclusion blockquote: One sentence — who won, key metric delta, core reason
Comparison table: Rank, F(g), V(g), Fidelity, Success rate, Latency score, Source
Current ranking: Full domain leaderboard (ASCII table, ← marks new entries)
Analysis: 2–3 paragraphs on fitness gap attribution, security comparison, same-fidelity positioning
Upgrade path: Table with path / action / expected improvement / effort
Reproduction steps: 4–5 pure CLI commands
Next steps: 4 commands with brief descriptions
Footer: Generated on YYYY-MM-DD · @rotifer/playground@X.Y.Z · Mode: deterministic estimation

Scenario Examples

Example 1: Evaluate a ClawHub Skill's Competitiveness

User: Evaluate the web-search skill from ClawHub in the Rotifer ecosystem

Skill execution:
1. rotifer wrap clawhub-web-search --from-clawhub web-search -d search
2. rotifer compile clawhub-web-search
3. Auto-discover same-domain opponent: genesis-web-search (Native, F(g)=0.9470)
4. rotifer arena submit clawhub-web-search
5. Generate comparison report

Example 2: Compare Two Custom Genes

User: Compare my particle-brute and particle-spatial — which is better?

Skill execution:
1. Confirm both Genes exist with phenotype.json
2. rotifer arena submit particle-brute
3. rotifer arena submit particle-spatial
4. rotifer arena list --domain sim.particle
5. Generate comparison report

Example 3: Build a Quantitative Scenario

User: Help me build a quantitative strategy comparison scenario

Skill execution:
1. Guide user to define domain (e.g. quant.strategy)
2. Guide creation of two Gene phenotype.json files (Strategy A vs Strategy B)
3. If compilable source exists, compile to WASM
4. rotifer arena submit both Genes
5. Generate scenario comparison report

Prerequisites

Project has a rotifer.json (if not, guide rotifer init)
CLI is built (npm run build in rotifer-playground)
ClawHub imports require network connectivity

Related Skills

Skill	Relationship
`gene-dev`	Route here when users need to create a Gene from scratch
`gene-migration`	Route here when the report recommends a fidelity upgrade
`gene-audit`	Suggest running when the report shows low security scores

Constraints

No automatic Cloud publishing: Comparison evaluation is a local operation; Cloud publishing requires explicit user confirmation
Cross-fidelity comparisons need a disclaimer: The baseFitness gap between Wrapped and Native comes from the scoring model, not actual capability differences
Reports are Markdown format: Ready for blogs, community sharing, or GitHub Issues

Rotifer Arena

Install

Rotifer Arena — Gene Comparison & Evaluation

Prerequisites

Overview

Workflow

Phase 1: Identify Evaluation Target

Phase 2: Compile & Verify

Phase 3: Automatic Opponent Matching

Phase 4: Arena Submit & Compare

Phase 5: Generate Evaluation Report

Scenario Examples

Example 1: Evaluate a ClawHub Skill's Competitiveness

Example 2: Compare Two Custom Genes

Example 3: Build a Quantitative Scenario

Prerequisites

Related Skills

Constraints

Related skills