AB Test Eval

v2.0.3

Run A/B evaluation tests for any OpenClaw skill, script, hook, or cron job. Make sure to use this skill whenever the user mentions testing, benchmarking, com...

1· 67·0 current·0 all-time
bySiyuan Huang@cyrushuang1995-cmyk
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
Capability signals
CryptoRequires OAuth token
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name and description match the requested capabilities. Requiring shasum and mkdir is coherent with the described fingerprinting and workspace creation tasks; no unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md confines actions to reading the target skill's SKILL.md, generating evals, creating a workspace, caching results, and spawning subagents for arms. It explicitly asks for user approval before full runs and provides dry-run and smoke-test modes. There are no instructions to read unrelated system files or to exfiltrate data to third-party endpoints.
Install Mechanism
Instruction-only skill with no install spec and no downloads. This minimizes disk-written code and supply-chain risk.
Credentials
No environment variables, credentials, or config paths are requested. The few required binaries (shasum, mkdir) are proportional to the claimed functionality.
Persistence & Privilege
The skill writes files under a workspace sibling to the target skill directory and maintains a local .cache. It does not request 'always' presence. Users should note it will create and write files in skill-related directories (iteration/, .cache/, history.jsonl).
Assessment
This skill appears to do what it says: generate test cases, create a workspace, fingerprint inputs with shasum, and spawn subagents to run arms. Before running: (1) review the generated evals/evals.json and approve runs (the skill itself instructs to ask for approval), (2) prefer the --dry preview and --smoke test to limit initial token spend, (3) be aware it will create files under the target skill's workspace and .cache (so run only on skills/dirs you trust), and (4) ensure your runtime allows spawning subagents and running shasum/mkdir. Because it spawns subagents, expect potential token costs and concurrent activity — that is expected behavior rather than a red flag.

Like a lobster shell, security has layers — review code before you run it.

latestvk976abzda752qnwq6g9yed2ph984nycz

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Binsshasum, mkdir

Comments