Agent Learner

v2.0.2

Benchmark and compare agent prompts and evaluation results. Use when tuning strategies, evaluating outputs, or comparing configurations.

0· 426· 13 versions· 1 current· 2 all-time· Updated 4h ago· MIT-0

bybytesagain4@xueyetianya

Security Scans

VirusTotalBenign ClawScanBenign Static analysisBenign

Install

openclaw skills install agent-learner

Agent Learner

An AI toolkit for configuring, benchmarking, comparing, and optimizing agent prompts and evaluation results. Agent Learner provides persistent, file-based logging for each command category with timestamped entries, summary statistics, multi-format export, and full-text search across all records.

Commands

Command	Description
`configure`	Configure agent settings — log configuration entries or view recent ones
`benchmark`	Benchmark agent performance — log benchmark results or view history
`compare`	Compare agent outputs — log comparison data or view recent comparisons
`prompt`	Prompt management — log prompt variations or view recent prompts
`evaluate`	Evaluate agent outputs — log evaluation results or view history
`fine-tune`	Fine-tune parameters — log fine-tuning sessions or view recent ones
`analyze`	Analyze agent behavior — log analysis entries or view recent analyses
`cost`	Cost tracking — log cost data or view recent cost entries
`usage`	Usage monitoring — log usage metrics or view recent usage data
`optimize`	Optimize configurations — log optimization runs or view history
`test`	Test agent behavior — log test results or view recent tests
`report`	Report generation — log report entries or view recent reports
`stats`	Show summary statistics across all log categories (entry counts, data size, first entry date)
`export <fmt>`	Export all data in json, csv, or txt format to the data directory
`search <term>`	Full-text search across all log files (case-insensitive)
`recent`	Show the 20 most recent entries from the activity history log
`status`	Health check — show version, data directory, total entries, disk usage, and last activity
`help`	Show the full help message with all available commands
`version`	Print the current version string

Each data command (configure, benchmark, compare, etc.) works in two modes:

Without arguments: displays the 20 most recent entries from that category
With arguments: saves the input as a new timestamped entry and reports the total count

Data Storage

All data is stored in plain text files under the data directory:

Category logs: $DATA_DIR/<command>.log — one file per command (e.g., configure.log, benchmark.log, prompt.log), each entry is timestamp|value
History log: $DATA_DIR/history.log — audit trail of every command executed with timestamps
Export files: $DATA_DIR/export.<fmt> — generated by the export command in json, csv, or txt format

Default data directory: ~/.local/share/agent-learner/

Requirements

Bash (with set -euo pipefail support)
Standard Unix utilities: grep, cat, date, echo, wc, du, head, tail, basename
No external dependencies or API keys required

When to Use

Benchmarking agent performance — When you need to track and compare benchmark results across different agent configurations, models, or prompt strategies
Prompt engineering iteration — When you're testing multiple prompt variations and want to log each version with results for later comparison
Cost and usage tracking — When you need to monitor API costs and usage metrics over time to optimize spending
Fine-tuning experiments — When running fine-tuning sessions and you want to log parameters, results, and observations for reproducibility
Cross-category analysis — When you need to search across all logged data (benchmarks, prompts, evaluations, costs) to find patterns or specific entries

Examples

# Initialize and check status
agent-learner status

# Log a benchmark result
agent-learner benchmark "GPT-4o on MMLU: 88.7% accuracy, 1.2s avg latency"

# Log a prompt variation
agent-learner prompt "System: You are a helpful coding assistant. Always explain your reasoning step by step."

# Compare two configurations
agent-learner compare "GPT-4o vs Claude-3.5: GPT-4o 12% faster, Claude 5% more accurate on code tasks"

# Track costs
agent-learner cost "March batch: 12,450 tokens input, 3,200 tokens output, $0.47 total"

# View all recent benchmarks
agent-learner benchmark

# Search across all logs for a specific term
agent-learner search "accuracy"

# Export all data as JSON
agent-learner export json

# View summary statistics
agent-learner stats

# Show recent activity
agent-learner recent

Output

All commands return output to stdout. Export files are written to the data directory:

agent-learner export json   # → ~/.local/share/agent-learner/export.json
agent-learner export csv    # → ~/.local/share/agent-learner/export.csv
agent-learner export txt    # → ~/.local/share/agent-learner/export.txt

Every command execution is logged to $DATA_DIR/history.log for auditing purposes.

Version tags

latestvk973wddz8yy850dhe7egq3w3b58345sw