Install
openclaw skills install agent-scorecardConfigurable quality evaluation for AI agent outputs. Define criteria, run evaluations, track quality over time. No LLM-as-judge, no API calls, pattern-based...
openclaw skills install agent-scorecardConfigurable quality evaluation for AI agent outputs. Define criteria, run evaluations, track quality over time. No LLM-as-judge, no API calls, pattern-based automated checks.
Configurable quality evaluation for AI agent outputs. Define criteria, run evaluations, track quality over time.
Agent Scorecard gives you a structured, repeatable way to measure whether your AI agent is producing good output — and whether it's getting better or worse over time. No LLM-as-judge, no API calls, no external dependencies. Everything runs locally with pattern-based automated checks and optional human scoring.
You changed your agent's system prompt. Is the output better now? You don't know. You added a new tool. Did response quality degrade? You have a feeling, but no data. Quality management for AI agents is mostly vibes.
Agent Scorecard replaces vibes with numbers.
config_example.json)scorecard.py)scorecard_track.py)scorecard_track.py)scorecard_report.py)# 1. Configure
cp config_example.json scorecard_config.json
# Edit dimensions, thresholds, and weights for your use case
# 2. Evaluate a response
python3 scorecard.py --config scorecard_config.json --input response.txt
# 3. Evaluate and save to history
python3 scorecard.py --config scorecard_config.json --input response.txt --save history.jsonl
# 4. Manual scoring mode
python3 scorecard.py --config scorecard_config.json --input response.txt --manual --save history.jsonl
# 5. View trends
python3 scorecard_track.py --history history.jsonl --summary
# 6. Compare before/after (last 10 vs previous 10)
python3 scorecard_track.py --history history.jsonl --compare 10
# 7. Generate a report
python3 scorecard_report.py --config scorecard_config.json --history history.jsonl
from scorecard import Scorecard, _load_config
cfg = _load_config("scorecard_config.json")
sc = Scorecard(cfg)
text = open("agent_response.txt").read()
result = sc.evaluate(text, agent="my-agent", task_type="code-review")
print(result.summary())
# Overall: 3.85/5 (PASS)
# ✓ Accuracy: 4.0/5 (threshold 3, weight 2.0) [auto]
# ✓ Completeness: 3.5/5 (threshold 3, weight 1.5) [auto]
# ...
# Save for tracking
import json
with open("history.jsonl", "a") as f:
f.write(json.dumps(result.to_dict()) + "\n")
| File | Purpose |
|---|---|
scorecard.py | Main evaluation engine — define, evaluate, score |
scorecard_track.py | Historical tracking and trend analysis |
scorecard_report.py | Report generation (markdown, JSON) |
config_example.json | Full configuration template with all tunables |
LIMITATIONS.md | What this tool doesn't do |
LICENSE | MIT License |
See config_example.json for the complete reference. Key areas:
DIMENSIONS — Quality dimensions with rubrics, weights, thresholds, and auto-checksAUTO_CHECKS — Tuning for each pattern-based check (markers, thresholds, penalties)AGGREGATE_METHOD — How to combine dimension scores ("weighted_average", "minimum", "geometric_mean")HISTORY_FILE — Where to store evaluation historyREPORT_OUTPUT_DIR — Where reports are savedMIT — See LICENSE file.
Configuration is loaded from a JSON file. This is safe to share — no code execution.
.json file — raises ValueError if given a non-JSON pathThis software is provided "AS IS", without warranty of any kind, express or implied.
USE AT YOUR OWN RISK.
By downloading, installing, or using this software, you acknowledge that you have read this disclaimer and agree to use the software entirely at your own risk.
DATA DISCLAIMER: This software processes and stores data locally on your system. The author(s) are not responsible for data loss, corruption, or unauthorized access resulting from software bugs, system failures, or user error. Always maintain independent backups of important data. This software does not transmit data externally unless explicitly configured by the user.
| 🐛 Bug Reports | TheShadowyRose@proton.me |
| ☕ Ko-fi | ko-fi.com/theshadowrose |
| 🛒 Gumroad | shadowyrose.gumroad.com |
| @TheShadowyRose | |
| 🐙 GitHub | github.com/TheShadowRose |
| 🧠 PromptBase | promptbase.com/profile/shadowrose |
Built with OpenClaw — thank you for making this possible.
🛠️ Need something custom? Custom OpenClaw agents & skills starting at $500. If you can describe it, I can build it. → Hire me on Fiverr