Install
openclaw skills install @vincentjiang06/attackerAttack a product's observable behavior, or red-team an idea/argument/plan; a FRESH, TDD-independent subagent records ONLY proven, reproducible breakages. Use-when: "attack/break this feature", "red-team this product / idea", "$attacker". Do-NOT: write tests or fix bugs — it only finds breakages, never edits the target.
openclaw skills install @vincentjiang06/attackerAttack a product's observable behavior with everything you've got, then record
the attacks that succeeded — proven, reproducible breakages — so a later
round can fix them. The deliverable is a set of attack records (a handoff
document set), each a runnable repro plus observed != expected — never a fix,
never edits to the target, never a passing test suite.
The defect this skill targets is the false-positive test suite: a green TDD suite on top of a broken product. The cause is correlated error — a test, a mock, an "expected" fixture, even the author's framing, produced from the same mental model as the impl, inherits that model's misread. The only fix is engineered independence (Knight–Leveson: the specification is the dominant common-mode channel). So the attacker runs from a context that never saw the impl, the tests, or the author's framing. Independence is the entire value proposition — every other choice is downstream of protecting it.
Gate order (load references/context-intake.md for each step's full detail, the elicitation
prompts, and the self-research independence split):
summary.context_sources (≥1).--scope → summary.in_scope (≥1; every record tags attack_scope ∈ it);
--out-of-scope → top-level out_of_scope[] (kept, NOT counted). Resolve --target + --round N.target.type (product | idea). Same loop/verdict/budget; only oracle + proof shape differ.
2b. Attack MODE — summary.attack_mode (debug | structural | both). Orthogonal altitude;
structural DROPS withhold+seam, REQUIRES critique_basis + a structural oracle; both = structural FIRST.
2c. Scope stability + depth. summary.scope_change (incremental only) + summary.depth (int ≥ 1).--budget N (≈ 12) + --max-tokens T (≈ 60k) + soft
--max-context (≈ 30k). Stop at whichever hits first, exhaust-budget mode (ALL breaks); MAY early-exit..loop/ for records + the battery ledger (NOT under the skill dir).Each round emits a machine-readable round_verdict (broke | clean |
inconclusive) + stop_reason (plan_complete | budget_exhausted) on the
summary the LOOP branches on:
clean (no proven break, plan ran to completion) → the loop's done/converged
signal → STOP. Honest caveat: clean ≠ proven correct; it is "no proven break
within budget B."broke (≥1 proven break) → route to a fix round, then re-attack.inconclusive (a budget cap hit, nothing found — NOT proven correct) → a
qualified stop the loop owner decides on.Spawn the attack as a fresh, isolated subagent with a curated bundle of ONLY (a) the
requirement / intended behavior and (b) the target's observable behavior (invoke + observe +
baseline). Do NOT pass impl source, the TDD/unit suite, or author framing — withholding is a
property of what your spawn prompt includes, audited per record by
independence_attestation.withheld (validator REJECTS a confirmed record missing
implementation_source/tdd_suite). Same-model author ⇒ separate fresh checker (generator ≠ judge).
Inside the subagent run the five phases — load references/attack-process.md for the full
per-phase procedure:
expected independently; measure a baseline.assets/payload-library.json); attack cheapest-highest-impact first.both.references/oracle-menu.md); confirm
observed != expected; shrink to a 1-minimal repro; re-run it (repro.replayed_ok).records[]; unprovable → needs_judgment[]; out-of-scope → out_of_scope[].
Tag attack_scope + attack_kind; roll up ASR@n + verdict + the v0.3.1 fields + the per-kind/mode fields.Round > 1 follows a fixed sequence (full form in references/attack-process.md): Regression
FIRST (re-run prior repros by regression_key: fixed vs still-failing, the latter blocks) →
FILL context from that resolution (record in summary.context_sources) → DEEPEN (increment
summary.depth, attack within scope at greater depth / incremental new surface; NEVER restart from
scratch). Validator only checks depth ≥ 1 + a valid scope_change enum.
node scripts/validate_attack_records.mjs <project>/.loop/attack-records.jsonl
node evals/run_all.mjs # validator unit cases + the non-vacuity self-test
The round is done only when the validator is green AND the non-vacuity
self-test passes AND a fresh-context checker re-reads the records cold,
re-runs the repros, and signs assets/fresh-reader-checklist.md (maker ≠ checker).
The attack-records document set + roll-up (template assets/attack-record.template.md)
become the next round's fix list. Attacker NEVER edits the target.
Each control is enforced by a script, not a "be careful"; load rules/loop-and-metrics.md for
the full enforcement detail of each:
scripts/validate_attack_records.mjs (the §5 gate),
conditional on attack_kind then target.type; unprovable/vague → needs_judgment.in_scope/out_of_scope are validator-enforced.--budget cap, aborts.evals/run_all.mjs (planted-bug flagged + clean-control zero) +
scripts/check_battery_clean.mjs + scripts/check_release_gate.mjs.attacker is a sub-loop NODE, not the loop owner: loop-constructor emits the .loop/ runbook,
maker–checker is mandatory, and attacker is the loop's feedback_signal.check / STOP-CONDITION (the
verdict semantics above). Rounds alternate attacker → fixer → fresh attacker, carrying the prior
ledger forward. Load rules/loop-and-metrics.md for the round-alternation diagram, the dual
budget, the carry-forward ledger, and the metric definitions (ASR@n, severity histogram, the
must-be-zero false-negative / false-positive invariants).
| File | When to load |
|---|---|
references/context-intake.md | CONTEXT (Preflight step 0, a HARD GATE) — the context checklist (target+type, claim/thesis, constraints, success criteria, what-counts-as-a-break, in/out-of-scope, prior rounds), the mandatory-context gate + the SELF-RESEARCH-to-fill discipline (debug-vs-structural independence split), WHY more context = sharper attacks, and the elicitation prompts when context is thin. |
rules/loop-and-metrics.md | Running attacker inside a loop-constructor loop (round alternation, maker–checker, regression by regression_key), or computing the round roll-up / metrics. |
references/attack-process.md | The round — full READ→DESIGN→EXECUTE→PROVE→RECORD procedure + the fresh-context mechanism + target-adapter contract. |
references/oracle-menu.md | PROVE — the kind/mode-appropriate ranked oracle taxonomy: product (implicit→differential→metamorphic→control→specified), idea (counterexample / contradiction / unmet_assumption / scope_violation / infeasibility / missing_case), and the STRUCTURAL oracle set (§S — the idea oracles that fit design critique + specified). |
assets/payload-library.json | DESIGN — the §3 adversarial taxonomy as data (AFL values, unicode/CJK, business-logic catalog). |
assets/fresh-reader-checklist.md | Verify — the REQUIRED manual semantic gate (maker ≠ checker). |
assets/attack-record.template.md | Report — turn records into the next round's fix list. |
| File | Usage |
|---|---|
scripts/validate_attack_records.mjs | node … <records.json|.jsonl> — the deterministic §5 gate; exports validate() (imported by the harness). |
scripts/check_battery_clean.mjs | node … <ledger.json> [--need N] — N-consecutive-clean battery ledger (anti copy-paste, asymptotic). |
scripts/check_release_gate.mjs | node … [--battery <ledger>] — binds "industrial" to green run_all + non-vacuity (+ hardened). |
evals/run_all.mjs | node … — harness; imports validate(), runs one case per adversarial-checklist entry + the non-vacuity self-test. |
| File | Usage |
|---|---|
schemas/attack-record.schema.json | draft-07 contract; prove-or-flag split records[] (proven) vs needs_judgment[] + roll-up. |