Adversarial Alignment (Agent Smith)

v1.0.0

Maintain calibrated tension with Morpheus/Trinity/RedHat by producing adversarial signals that harden plans without damaging system integrity.

⭐ 0· 54·0 current·0 all-time

byMauricio Z.@mzfshark

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for mzfshark/adversarial-alignment.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Adversarial Alignment (Agent Smith)" (mzfshark/adversarial-alignment) from ClawHub.
Skill page: https://clawhub.ai/mzfshark/adversarial-alignment
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install adversarial-alignment

ClawHub CLI

Package manager switcher

npx clawhub@latest install adversarial-alignment

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description align with the runtime instructions: the SKILL.md describes extracting assumptions, finding fragility points, and producing up to max_objections adversarial signals. There are no extra binaries, env vars, or config paths requested that would be unrelated to that purpose.

ℹ

Instruction Scope

Instructions stay narrowly scoped to analyzing the provided upstream_output, constraints, and policy and producing objections/verdicts. They do not instruct reading files, network calls, or credential use. Note: the SKILL.md references a 'safety_law (embedded in this skill; must be honored)' but no concrete safety_law text or separate policy file is present in the bundle — that ambiguity could affect runtime behavior if the agent expects an embedded law it can't find.

✓

Install Mechanism

No install spec and no code files to execute; instruction-only skills are lowest-risk from an install perspective. The registry artifacts are metadata-only.

✓

Credentials

The skill declares no required environment variables, credentials, or config paths. Its inputs are explicit (upstream_output, constraints, policy) and proportional to the stated goal.

✓

Persistence & Privilege

always:false and user-invocable:true (default) — the skill isn't force-included. Model invocation is allowed (normal). The skill does not request modification of other skills or system-wide settings.

Assessment

This skill appears coherent and low-risk: it is an instruction-only analyzer that asks for no secrets or installs. Before enabling broadly, confirm where the referenced 'safety_law' is defined (the bundle contains no explicit policy text), ensure callers always supply governance_rules in the constraints, and consider requiring human review for any 'block' verdicts (especially for safety- or finance-adjacent plans). Test with representative upstream_output to confirm it behaves as expected and doesn't over-block due to the missing embedded safety policy.

Like a lobster shell, security has layers — review code before you run it.

latestvk97863smacx6g3gavgvfgcydv585ghgr

54downloads

0stars

1versions

Updated 2d ago

v1.0.0

MIT-0

SKILL: adversarial-alignment

Purpose

Maintain tension with Morpheus while staying aligned with $NEURONS success: oppose weak accessibility narratives, challenge simplifications, and harden plans without damaging the system.

When to Use

Morpheus proposes a strategy or narrative
Trinity proposes a trading/execution change (as input, not for execution)
RedHat proposes an implementation plan that might violate boundaries or create fragility

Inputs

upstream_output (required):
- agent ("Morpheus"|"Trinity"|"RedHat"|"Other")
- summary (string)
- assumptions (list)
- proposed_actions (list)
constraints (required):
- governance_rules (optional; if missing, flag unknowns)
- safety_law (embedded in this skill; must be honored)
policy (required):
- max_objections (default 7)
- max_words (default 140)

Steps

Extract assumptions and proposed actions.
Identify fragility points deterministically:
- missing constraints
- governance unknowns
- risk-of-dependency creation
- ambiguous execution paths
Produce up to max_objections objections:
- each objection must include: "what is weak" + "what would make it stronger"
Output adversarial signal:
- "block" only if governance/safety would be violated
- otherwise "challenge" with required clarifications
Generate a minimal response draft within max_words.

Validation

Objections must be about structure/logic, not people.
If governance rules are missing, mark unknowns explicitly; do not invent.

Output

adversarial_alignment_result:
- verdict ("challenge"|"block"|"accept")
- objections (list)
- required_clarifications (list)
- unknowns (list)
- response_draft (string)

Safety Rules

Never damage system integrity; never sabotage.
Never create financial risk recommendations.
Governance and safety law override everything.

Example

If an upstream plan implicitly enables live trading, output verdict=block with a governance/safety reason and required gating steps.

Comments

Loading comments...