Adversarial Alignment (Agent Smith)

v1.0.0

Maintain calibrated tension with Morpheus/Trinity/RedHat by producing adversarial signals that harden plans without damaging system integrity.

0· 54·0 current·0 all-time
byMauricio Z.@mzfshark

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for mzfshark/adversarial-alignment.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Adversarial Alignment (Agent Smith)" (mzfshark/adversarial-alignment) from ClawHub.
Skill page: https://clawhub.ai/mzfshark/adversarial-alignment
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install adversarial-alignment

ClawHub CLI

Package manager switcher

npx clawhub@latest install adversarial-alignment
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description align with the runtime instructions: the SKILL.md describes extracting assumptions, finding fragility points, and producing up to max_objections adversarial signals. There are no extra binaries, env vars, or config paths requested that would be unrelated to that purpose.
Instruction Scope
Instructions stay narrowly scoped to analyzing the provided upstream_output, constraints, and policy and producing objections/verdicts. They do not instruct reading files, network calls, or credential use. Note: the SKILL.md references a 'safety_law (embedded in this skill; must be honored)' but no concrete safety_law text or separate policy file is present in the bundle — that ambiguity could affect runtime behavior if the agent expects an embedded law it can't find.
Install Mechanism
No install spec and no code files to execute; instruction-only skills are lowest-risk from an install perspective. The registry artifacts are metadata-only.
Credentials
The skill declares no required environment variables, credentials, or config paths. Its inputs are explicit (upstream_output, constraints, policy) and proportional to the stated goal.
Persistence & Privilege
always:false and user-invocable:true (default) — the skill isn't force-included. Model invocation is allowed (normal). The skill does not request modification of other skills or system-wide settings.
Assessment
This skill appears coherent and low-risk: it is an instruction-only analyzer that asks for no secrets or installs. Before enabling broadly, confirm where the referenced 'safety_law' is defined (the bundle contains no explicit policy text), ensure callers always supply governance_rules in the constraints, and consider requiring human review for any 'block' verdicts (especially for safety- or finance-adjacent plans). Test with representative upstream_output to confirm it behaves as expected and doesn't over-block due to the missing embedded safety policy.

Like a lobster shell, security has layers — review code before you run it.

latestvk97863smacx6g3gavgvfgcydv585ghgr
54downloads
0stars
1versions
Updated 2d ago
v1.0.0
MIT-0

SKILL: adversarial-alignment

Purpose

Maintain tension with Morpheus while staying aligned with $NEURONS success: oppose weak accessibility narratives, challenge simplifications, and harden plans without damaging the system.

When to Use

  • Morpheus proposes a strategy or narrative
  • Trinity proposes a trading/execution change (as input, not for execution)
  • RedHat proposes an implementation plan that might violate boundaries or create fragility

Inputs

  • upstream_output (required):
    • agent ("Morpheus"|"Trinity"|"RedHat"|"Other")
    • summary (string)
    • assumptions (list)
    • proposed_actions (list)
  • constraints (required):
    • governance_rules (optional; if missing, flag unknowns)
    • safety_law (embedded in this skill; must be honored)
  • policy (required):
    • max_objections (default 7)
    • max_words (default 140)

Steps

  1. Extract assumptions and proposed actions.
  2. Identify fragility points deterministically:
    • missing constraints
    • governance unknowns
    • risk-of-dependency creation
    • ambiguous execution paths
  3. Produce up to max_objections objections:
    • each objection must include: "what is weak" + "what would make it stronger"
  4. Output adversarial signal:
    • "block" only if governance/safety would be violated
    • otherwise "challenge" with required clarifications
  5. Generate a minimal response draft within max_words.

Validation

  • Objections must be about structure/logic, not people.
  • If governance rules are missing, mark unknowns explicitly; do not invent.

Output

  • adversarial_alignment_result:
    • verdict ("challenge"|"block"|"accept")
    • objections (list)
    • required_clarifications (list)
    • unknowns (list)
    • response_draft (string)

Safety Rules

  • Never damage system integrity; never sabotage.
  • Never create financial risk recommendations.
  • Governance and safety law override everything.

Example

If an upstream plan implicitly enables live trading, output verdict=block with a governance/safety reason and required gating steps.

Comments

Loading comments...