B3ehive

v0.1.0

Runs three AI agents in parallel to implement, cross-evaluate, score, and select the best code solution for a given coding task objectively.

2· 1k· 1 versions· 2 current· 2 all-time· Updated 6h ago· MIT-0

by@weiyangzen

Security Scans

VirusTotalBenign ClawScanBenign Static analysisBenign

Install

openclaw skills install b3ehive

b3ehive Skill Specification

PCTF-Compliant Multi-Agent Competition System

1. Purpose (PCTF: Purpose)

Enable competitive code generation where three isolated AI agents implement the same functionality, evaluate each other objectively, and deliver the optimal solution through data-driven selection.

2. Task Definition (PCTF: Task)

Input

task_description: String describing the coding task
constraints: Optional constraints (time/space complexity, language, etc.)

Output

final_solution: Directory containing the winning implementation
comparison_report: Markdown analysis of all three approaches
decision_rationale: Explanation of why the winner was selected

Success Criteria

assertions:
  - final_solution/implementation exists and is runnable
  - comparison_report.md exists with objective metrics
  - decision_rationale.md explains selection logic
  - all three agent implementations are documented
  - evaluation scores are numeric and justified

3. Chain Flow (PCTF: Chain)

graph TD
    A[User Task] --> B[Phase 1: Parallel Spawn]
    B --> C[Agent A: Simplicity]
    B --> D[Agent B: Speed]
    B --> E[Agent C: Robustness]
    C --> F[Phase 2: Cross-Evaluation]
    D --> F
    E --> F
    F --> G[6 Evaluation Reports]
    G --> H[Phase 3: Self-Scoring]
    H --> I[3 Scorecards]
    I --> J[Phase 4: Final Delivery]
    J --> K[Best Solution]

Phase 1: Parallel Implementation

Agent Prompt Template:

role: "Expert Software Engineer"
focus: "{{agent_focus}}"  # Simplicity / Speed / Robustness
task: "{{task_description}}"
constraints:
  - Complete runnable code in implementation/
  - Checklist.md with ALL items checked
  - SUMMARY.md with competitive advantages
  - Must differ from other agents' approaches

linter_rules:
  - code_compiles: true
  - tests_pass: true
  - no_todos: true
  - documented: true

assertions:
  - implementation/main.* exists
  - tests exist and pass
  - Checklist.md is complete
  - SUMMARY.md explains unique approach

Phase 2: Cross-Evaluation

Evaluation Prompt Template:

evaluator: "Agent {{from}}"
target: "Agent {{to}}"
task: "Objectively prove your solution is superior"

dimensions:
  simplicity:
    weight: 20
    metrics:
      - lines_of_code: count
      - cyclomatic_complexity: calculate
      - readability_score: 1-10
  
  speed:
    weight: 25
    metrics:
      - time_complexity: big_o
      - space_complexity: big_o
      - benchmark_results: run_if_possible
  
  stability:
    weight: 25
    metrics:
      - error_handling_coverage: percentage
      - resource_cleanup: check
      - fault_tolerance: test
  
  corner_cases:
    weight: 20
    metrics:
      - input_validation: comprehensive
      - boundary_conditions: covered
      - edge_cases: tested
  
  maintainability:
    weight: 10
    metrics:
      - documentation_quality: 1-10
      - code_structure: logical
      - extensibility: easy/hard

assertions:
  - evaluation is objective with data
  - specific code snippets cited
  - numeric scores provided
  - persuasion argument is data-driven

Phase 3: Objective Scoring

Scoring Prompt Template:

agent: "Agent {{name}}"
task: "Fairly score yourself and competitors"

self_evaluation:
  - dimension: simplicity
    max: 20
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: speed
    max: 25
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: stability
    max: 25
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: corner_cases
    max: 20
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: maintainability
    max: 10
    score: "{{self_score}}"
    justification: "{{why}}"

peer_evaluation:
  - target: "Agent {{other}}"
    scores: "{{numeric_scores}}"
    comparison: "{{objective_comparison}}"

final_conclusion:
  best_implementation: "[A/B/C/Mixed]"
  reasoning: "{{data_driven_justification}}"
  recommendation: "{{delivery_strategy}}"

assertions:
  - all scores are numeric
  - justifications are specific
  - no inflation or bias
  - conclusion is evidence-based

Phase 4: Final Delivery

Decision Logic:

def select_winner(scores):
    """
    Select final solution based on competitive scores
    """
    margins = calculate_score_margins(scores)
    
    if margins.winner - margins.second > 15:
        # Clear winner
        return SingleWinner(scores.winner)
    elif margins.winner - margins.second > 5:
        # Close competition, consider hybrid
        return HybridSolution(scores.top_two)
    else:
        # Very close, pick simplest
        return SimplestImplementation(scores.all)

assertions:
  - final_solution is runnable
  - comparison_report explains all approaches
  - decision_rationale is transparent
  - attribution is given to winning agent

4. Format Specifications (PCTF: Format)

Directory Structure

workspace/
├── run_a/
│   ├── implementation/      # Agent A code
│   ├── Checklist.md         # Completion checklist
│   ├── SUMMARY.md           # Approach summary
│   ├── evaluation/          # Evaluations of B, C
│   └── SCORECARD.md         # Self-scoring
├── run_b/                   # Same structure
├── run_c/                   # Same structure
├── final/                   # Winning solution
├── COMPARISON_REPORT.md     # Full analysis
└── DECISION_RATIONALE.md    # Why winner selected

File Formats

Checklist.md: Markdown with - [x] checkboxes
SUMMARY.md: Markdown with sections
EVALUATION_*.md: Markdown with tables
SCORECARD.md: Markdown with score tables
Implementation: Runnable code files

5. Linter & Validation

Pre-commit Checks

#!/bin/bash
# scripts/lint.sh

lint_agent_output() {
    local agent_dir="$1"
    local errors=0
    
    # Check required files exist
    for file in Checklist.md SUMMARY.md implementation/main.*; do
        if [[ ! -f "${agent_dir}/${file}" ]]; then
            echo "ERROR: Missing ${file}"
            ((errors++))
        fi
    done
    
    # Check Checklist is complete
    if grep -q "\[ \]" "${agent_dir}/Checklist.md"; then
        echo "ERROR: Checklist has unchecked items"
        ((errors++))
    fi
    
    # Check code compiles (language-specific)
    # ... implementation-specific checks
    
    return $errors
}

# Run on all agents
for agent in a b c; do
    lint_agent_output "workspace/run_${agent}" || exit 1
done

Runtime Assertions

def assert_phase_complete(phase_name):
    """Assert that a phase has completed successfully"""
    assertions = {
        "phase1": [
            "workspace/run_a/implementation exists",
            "workspace/run_b/implementation exists", 
            "workspace/run_c/implementation exists",
            "All Checklist.md are complete"
        ],
        "phase2": [
            "6 evaluation reports exist",
            "All evaluations have numeric scores"
        ],
        "phase3": [
            "3 scorecards exist",
            "All scores are numeric",
            "Conclusions are provided"
        ],
        "phase4": [
            "final/solution exists",
            "COMPARISON_REPORT.md exists",
            "DECISION_RATIONALE.md exists"
        ]
    }
    
    for assertion in assertions[phase_name]:
        assert evaluate(assertion), f"Assertion failed: {assertion}"

6. Configuration

b3ehive:
  # Agent configuration
  agents:
    count: 3
    model: openai-proxy/gpt-5.3-codex
    thinking: high
    focuses:
      - simplicity
      - speed
      - robustness
  
  # Evaluation weights (must sum to 100)
  evaluation:
    dimensions:
      simplicity: 20
      speed: 25
      stability: 25
      corner_cases: 20
      maintainability: 10
  
  # Delivery strategy
  delivery:
    strategy: auto  # auto / best / hybrid
    threshold: 15   # Point margin for clear winner
  
  # Quality gates
  quality:
    lint: true
    test: true
    coverage_threshold: 80

7. Usage

# Basic usage
b3ehive "Implement a thread-safe rate limiter"

# With constraints
b3ehive "Implement quicksort" --lang python --max-lines 50

# Using OpenClaw CLI
openclaw skills run b3ehive --task "Your task"

8. License

Version tags

latestvk974p6twq4k4pvnx0p4d2vwysn80s4s2