Ab Test Runner

Security checks across static analysis, malware telemetry, and agentic risk

Overview

This is a coherent A/B testing instruction skill, with the main things to notice being that it can use subagents and save experiment results to persistent memory files.

This skill appears safe to install for prompt and content experimentation. Before using it, be aware that it may run several subagent evaluations and save results under memory/experiments, so avoid confidential test content unless you are comfortable with it being stored and reused in future experiment summaries.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Experiment outputs, scores, and conclusions may persist and influence future A/B testing decisions.

Why it was flagged

The skill stores experiment results and updates a reusable template in persistent memory files, so later experiments may rely on prior stored conclusions.

Skill content
汇总所有结果到 `memory/experiments/auto-ab-results.json` ... `memory/experiments/auto-ab-hypotheses.json` ... `memory/experiments/AB-test-design-template.md`
Recommendation

Review stored experiment files periodically, avoid placing secrets or private content in test prompts, and confirm template updates before relying on them.

What this means

Content submitted for an experiment may be processed by multiple agent instances, which matters if the test includes sensitive prompts or confidential outputs.

Why it was flagged

The workflow shares experiment tasks, rubrics, generated outputs, and anonymized outputs with subagents for generation and blind scoring.

Skill content
Spawn N 个 subagent(每组一个,或每个任务一个)... 再 spawn 1 个 subagent 做盲评
Recommendation

Use non-sensitive test data where possible, keep the stated concurrency limit, and confirm what content will be shared with subagents before running an experiment.