Agent Scorecard

v1.0.6

Configurable quality evaluation for AI agent outputs. Define criteria, run evaluations, track quality over time. No LLM-as-judge, no API calls, pattern-based...

0· 388·0 current·0 all-time
byShadow Rose@theshadowrose
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the included Python modules (scorecard.py, scorecard_track.py, scorecard_report.py) and config examples; functionality (pattern-based checks, manual scoring, history tracking, reports) is implemented with no extraneous credentials or binaries.
Instruction Scope
SKILL.md and README instruct running the included Python scripts on local files, saving history to a JSONL file, and generating reports. The runtime instructions only reference local files and config; they do not instruct reading unrelated system paths or exfiltrating data.
Install Mechanism
No install spec or remote downloads are declared. The package is instruction + source files only and uses the Python standard library; nothing is fetched from external URLs at install time.
Credentials
No environment variables, credentials, or config paths are required. The tool writes/reads local JSON/JSONL files (history, config, reports) which is appropriate for a tracking/reporting utility.
Persistence & Privilege
always:false and no special privileges requested. The tool persists its own history and report files locally (append-only JSONL), which is expected behavior and limited in scope.
Assessment
This skill appears coherent and implements a local, pattern-based scorecard for agent text. Before installing or running it: (1) review the included Python files yourself (they are present and readable) or run them in an isolated environment (virtualenv/container); (2) inspect and edit config_example.json to avoid recording sensitive content (the history file is plaintext JSONL and append-only by design); (3) be aware automated checks are surface-level (no semantic fact-checking) — rely on manual scoring for accuracy; (4) back up or secure the history/report directories if they might contain sensitive outputs; (5) if you need real-time or large-scale usage, consider replacing the JSONL history with a proper datastore because the tool has no concurrency safety. Overall, the package is internally consistent with low risk, but follow standard caution because the source provenance is 'unknown' and data stored is local and unencrypted.

Like a lobster shell, security has layers — review code before you run it.

agentvk972tmwfnvyyrg6r6d6jpm56bn82j79cevaluationvk972tmwfnvyyrg6r6d6jpm56bn82j79clatestvk973a4jh801kyrmgnvw7r0tv6x82m9enmetricsvk972tmwfnvyyrg6r6d6jpm56bn82j79cperformancevk972tmwfnvyyrg6r6d6jpm56bn82j79cqualityvk972tmwfnvyyrg6r6d6jpm56bn82j79c

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments