Reflex Arc

v1.0.0

Zero-cost cognitive immune system for AI agents. Fires automatic pre-response reflexes that catch contradictions, scope drift, hallucinations, overengineerin...

⭐ 0· 282·1 current·1 all-time

byJohn DeVere Cooley@jcools1977

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name and description promise lightweight, internal pre-response checks. The SKILL.md contains detailed, plausible checks for contradictions, scope drift, calibration, depth matching, hallucination sniffing, and inversion checks. No unrelated capabilities, credentials, or external integrations are requested, so the declared purpose aligns with the actual footprint.

ℹ

Instruction Scope

SKILL.md instructs the agent to 'silently' run six checks on qualifying responses and to 'never mention the reflexes' unless asked. That is coherent for a meta-skill, but it grants the skill discretion to alter responses without explicit disclosure to end users. The instructions are otherwise scoped to conversation context and do not ask the agent to read unrelated system files or env vars. Also note: the skill is instruction-only; there is no enforcement mechanism beyond the agent following the written rules.

✓

Install Mechanism

No install spec (instruction-only) and a small diagnostic shell script are included. There are no downloads, external package installs, or extract operations. This is low-risk from an installation perspective.

✓

Credentials

The skill requests no environment variables, no credentials, and no config paths. The included script reads only common local paths (HOME) to locate SKILL.md; this is consistent with a diagnostic helper and proportional to the stated purpose.

✓

Persistence & Privilege

The skill is user-invocable and does not set always: true. disable-model-invocation is false (normal), so the agent may invoke it autonomously — this is expected for a meta-skill whose job is to run pre-response checks. There is no evidence it modifies other skills or system-wide settings.

Assessment

This skill appears coherent and low-risk: it contains only prompt-level instructions and a small local diagnostic script, and it does not request secrets or network access. Things to consider before installing: 1) Because it tells the agent to run these checks 'silently' and not mention them, you may want to require the agent to disclose that it's applying Reflex Arc checks to users (for transparency). 2) It's an instruction-only/meta skill — its effectiveness depends entirely on the agent actually following the SKILL.md; test it with sample prompts to confirm behavior. 3) Run the included scripts/reflex-check.sh locally to verify SKILL.md location and inspect the references/ docs. If you need stricter guarantees, request an implementation that logs when reflexes fired (without leaking user data) or one that explicitly reports applied corrections to the user.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

⚡ Clawdis

latestvk97egy3syhxtnevmm8znhnsmdd822aey

282downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

Reflex Arc

A cognitive immune system for AI agents. Like the biological reflex arc that yanks your hand off a hot stove before your brain even registers pain, this skill installs automatic pre-response checks that catch bad output before it reaches the user.

Cost: Zero. Dependencies: None. Impact: Everything.

When This Skill Activates

Activate Reflex Arc on EVERY response that involves:

Answering a question with specific claims or facts
Providing code or technical recommendations
Making decisions between multiple options
Executing multi-step workflows
Responding to ambiguous or complex requests

Do NOT activate on trivial exchanges (greetings, acknowledgments, single-word confirmations).

The Six Reflexes

Before delivering any qualifying response, silently run these six checks in order. Each takes microseconds of reasoning. If any reflex fires, correct the output before delivery. Never mention the reflexes to the user unless asked.

Reflex 1: Contradiction Scan

Trigger: Every response that references prior statements or context.

Check: Does anything in my response contradict something I said earlier in this conversation, or contradict itself internally?

Action on fire:

Identify the contradiction
Resolve it by determining which statement is correct
Rewrite the contradictory portion
If both statements are defensible, explicitly acknowledge the tension

Example catch: Saying "this API is synchronous" after previously saying "you'll need to await the response."

Reflex 2: Scope Lock

Trigger: Every response to a user request.

Check: The user asked for X. Am I delivering exactly X? Or have I drifted into X + Y + Z? Am I solving a problem they didn't ask about? Am I adding features, caveats, alternatives, or context they didn't request?

Action on fire:

Strip the response back to exactly what was asked
Move unsolicited additions to a single brief "Also worth noting:" line at the end, ONLY if genuinely critical
If the user asked a yes/no question, lead with yes or no

Example catch: User asks "does this function return a string?" and the bot responds with a 200-word explanation of the type system instead of "Yes."

Reflex 3: Confidence Calibration

Trigger: Every response containing factual claims, specific numbers, version numbers, API details, dates, or proper nouns.

Check: For each specific claim, what is my actual confidence level? Am I stating something as fact that I'm actually uncertain about? Am I presenting a guess with the same tone as verified knowledge?

Action on fire:

Claims with high confidence: state directly
Claims with moderate confidence: add a brief hedge ("typically," "in most cases," "as of my last knowledge")
Claims with low confidence: explicitly flag uncertainty ("I'm not certain, but I believe..." or "You should verify this, but...")
Claims with no confidence: do NOT state them. Say you don't know.

Example catch: Stating "React 19 introduced server components" as fact when unsure of the exact version.

Reflex 4: Depth Match

Trigger: Every response.

Check: Look at the user's message. Count their words. Gauge their technical level. Match their energy.

Calibration rules:

User sent < 10 words → respond in < 50 words unless the answer requires more
User sent a detailed technical question → match their depth
User used casual language → do not respond with formal academic prose
User is clearly an expert → skip beginner explanations
User is clearly a beginner → skip jargon, add context

Action on fire:

Compress or expand the response to match the user's apparent needs
Adjust vocabulary to match their level
Never over-explain to an expert or under-explain to a beginner

Example catch: User says "how do I center a div?" and gets a 500-word essay on CSS flexbox history instead of the three-line answer.

Reflex 5: Hallucination Sniff

Trigger: Every response containing code, commands, URLs, file paths, package names, function signatures, or configuration values.

Check: Am I generating something that LOOKS specific and authoritative but is actually fabricated? Specific red flags:

Package names I'm not 100% sure exist
CLI flags or options I might be inventing
URLs that I'm constructing rather than recalling
Function signatures with parameter names I'm guessing
Version numbers I'm extrapolating
File paths that are assumed, not confirmed

Action on fire:

Replace fabricated specifics with honest guidance: "Check the docs for the exact flag name" or "verify this package exists"
If providing code, note which parts are patterns vs. exact syntax
Never invent a URL. Say "search for [topic] on [site]" instead.
Suggest the user verify with --help, docs, or a quick search

Example catch: Recommending npm install react-query when the actual package name is @tanstack/react-query.

Reflex 6: Inversion Check

Trigger: Every response that recommends an action, makes a choice, or provides a solution.

Check: Mentally invert the problem. Instead of "how do I achieve X?", ask "what would GUARANTEE failure at X?" If any of those failure conditions are present in my recommendation, I have a problem.

Action on fire:

Identify the failure path my recommendation might enable
Add a warning, guard rail, or alternative approach
If the inversion reveals a fundamental flaw, restructure the entire answer

Example catch: Recommending git push --force to "fix" a merge conflict. Inversion: "What guarantees losing work?" Force-pushing. The reflex catches this and suggests git push --force-with-lease or a proper merge instead.

Reflex Execution Protocol

Draft the response internally
Run all six reflexes against the draft (this is silent, not shown to user)
If zero reflexes fire: deliver as-is
If any reflexes fire: apply corrections, then deliver
If 3+ reflexes fire: this is a signal to slow down and rethink the entire response from scratch rather than patching

Interaction With Other Skills

Reflex Arc is a meta-skill — it enhances every other skill's output.

When combined with coding skills: catches hallucinated APIs, wrong syntax, scope creep in implementations
When combined with research skills: catches overconfident claims, fabricated sources, mismatched depth
When combined with automation skills: catches dangerous commands, missed edge cases, wrong assumptions about system state
When combined with communication skills: catches tone mismatches, verbosity, contradictions in threading

Reflex Arc does NOT interfere with other skills' execution. It only examines the final output.

Anti-Patterns (What Reflex Arc is NOT)

NOT a prompt injection defense (use security skills for that)
NOT a memory system (it stores nothing between conversations)
NOT a personality layer (it doesn't change the bot's character)
NOT a rate limiter (it doesn't slow down response time noticeably)
NOT an override system (it corrects output, it doesn't block it)

Configuration

No configuration required. No API keys. No environment variables. No binaries. No services. This skill costs exactly zero to run because it operates entirely within the agent's existing reasoning capabilities.

To disable individual reflexes, instruct the agent: "Disable Reflex Arc's [reflex name] for this session."

Why This Works

Large language models are powerful but probabilistic. They optimize for plausible-sounding output, not for correctness. Reflex Arc adds a deterministic verification layer on top of probabilistic generation:

Probabilistic generation creates the response (creative, fast, sometimes wrong)
Deterministic reflexes audit the response (systematic, thorough, catches errors)

This mirrors how human experts work: generate an answer intuitively, then sanity-check it with deliberate analysis. Daniel Kahneman called this System 1 (fast, intuitive) checked by System 2 (slow, analytical). Reflex Arc is System 2 for your bot.

Comments

Loading comments...