Agent Regression Check

v0.3.2

Evaluate whether an agent change introduced regressions by comparing matched before vs after case results. Use when reviewing prompt updates, model switches,...

0· 262·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name and description match the runtime instructions: the SKILL.md describes comparing matched before vs after case outputs, scoring with a deterministic rubric, clustering failures, and returning a single verdict. No unexpected binaries, credentials, or unrelated platform access are requested.
Instruction Scope
The instructions stay within the stated domain: validate evidence, score cases, compute metrics, cluster failures, and emit a verdict. The skill does not instruct the agent to read system files, access secrets, or send data to unknown external endpoints. Note: the skill will process whatever case data the user supplies, so sensitive user-provided content could be handled by the skill — that is expected but user-visible risk, not a coherence issue.
Install Mechanism
No install spec and no code files; this is instruction-only, so nothing is written to disk or downloaded during install. This is the lowest-risk pattern for an OpenClaw skill.
Credentials
The skill declares no required environment variables, credentials, or config paths. The inputs it asks for (before/after cases, change_summary, risk_level) are proportional to its stated purpose.
Persistence & Privilege
always is false and the skill does not request persistent system presence or modification of other skills. Autonomous invocation is allowed (platform default) but is not combined with any broad or unexplained privileges.
Assessment
This skill appears coherent and safe from a permissions standpoint, but standard caution applies: only feed it representative, non-sensitive case data (or sanitize PII) before running large comparisons; ensure case IDs are stable and suites are large/representative enough for a release decision; and treat this as an offline, deterministic gate — complement it with production monitoring and A/B/online experiments for deployment safety.

Like a lobster shell, security has layers — review code before you run it.

agentvk978whtb9b2hwhygbc7qpnamsh8309pkevaluationvk978whtb9b2hwhygbc7qpnamsh8309pklatestvk971jmrmsyeby29k5afsq3c1xh835bjtllmvk978whtb9b2hwhygbc7qpnamsh8309pkqavk978whtb9b2hwhygbc7qpnamsh8309pkregressionvk978whtb9b2hwhygbc7qpnamsh8309pk

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🧪 Clawdis
OSLinux · macOS · Windows

Comments