Ab Test Agent Workflow
v1.1.0多智能体双盲 A/B 测试工作流。对多个 AI 模型/Agent 进行多轮次、双盲对照测试。 核心角色:协调者(Coordinator)、受测者 A/B(Contestant)、评测者(Judge)。 触发场景:"A/B 测试"、"双盲测试"、"比较 AI 模型"、"模型评测"、"测试工作流"、 "compare...
⭐ 0· 56·1 current·1 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (A/B double‑blind testing) match the provided code and SKILL.md: runner.py coordinates rounds, judge_prompts.py builds/parses judge prompts, anonymizer.py removes identity traces. Nothing requests unrelated services, binaries, or credentials.
Instruction Scope
SKILL.md clearly restricts actions to: spawning subagents (sessions_spawn), collecting outputs, anonymizing, sending anonymous content to a judge, and recording results. It does not instruct reading system files, environment variables, or sending data to external endpoints beyond the subagent interactions described.
Install Mechanism
No install spec is provided (instruction‑only skill with included helper scripts). All code is local and pure Python; there are no downloads, package installs, or archive extraction steps.
Credentials
The skill declares no required environment variables, credentials, or config paths. The code does not attempt to access environment secrets or external tokens; its behavior is limited to prompt construction, parsing, anonymization, and local workflow logic.
Persistence & Privilege
always is false (no forced presence). The skill uses subagent sessions as part of its normal operation; it does not modify other skills or system settings, nor does it request permanent privileges.
Assessment
This skill appears to implement exactly what it claims: a coordinator-driven double‑blind A/B testing workflow with local helper scripts. Before installing/using it: (1) Confirm your agent/platform supports the sessions_spawn subagent flow referenced in the docs (subagents may run on different models/endpoints and thus transmit prompts to those models). (2) Avoid placing secrets or sensitive data in the test prompt because prompts and outputs are passed around to subagents and the Judge (the anonymizer removes some identity patterns but may not catch every possible leakage). (3) Review and run the included scripts in a safe environment (self-test mode) to verify behavior meets your expectations. If you need the skill to run against third‑party hosted models, verify where those model calls are executed and which external services will see your data.Like a lobster shell, security has layers — review code before you run it.
latestvk970q3xe82sz8bjydrm6vper7184ja0n
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
