Install
openclaw skills install anti-hallucination-skillDetects and mitigates hallucinations in agent outputs by self-checking facts, verifying claims, and correcting unsupported or contradictory information.
openclaw skills install anti-hallucination-skill"The first principle is that you must not fool yourself — and you are the easiest person to fool." — Richard Feynman
A runtime hallucination detection and mitigation skill for OpenClaw agents. Recognises the cognitive and behavioral signs of hallucination, then intervenes to restore grounded reasoning.
Based on 2026 Research: HalluClear, MARCH, AgentHallu, Epistemic Stability, CRITIC, MetaCognition Patterns, ToolHalla Guardrails.
Detection > Prevention. Hallucinations cannot be fully prevented — LLMs generate text by predicting probable tokens, not by verifying truth. The question is not whether your agent will hallucinate. It is whether your agent catches itself when it does.
Self-Awareness > External Guardrails. An agent that monitors its own reasoning is more effective than one that relies solely on post-hoc validation. The metacognitive loop — observe, critique, correct — must be internal.
Specificity > Generality. Generic "be careful" instructions fail. Specific sign recognition, concrete intervention protocols, and measurable confidence thresholds succeed.
Automatic triggers — ANY of these activates the anti-hallucination protocol:
Implicit triggers (monitor continuously):
Know what you're looking for:
| Type | Description | Example |
|---|---|---|
| Intrinsic Factual | Contradicts source material | Claims file exists when read returned error |
| Intrinsic Semantic | Misrepresents meaning | Misreads config flag, draws wrong conclusion |
| Intrinsic Temporal | Wrong timing/sequence | "Yesterday I did X" when memory shows no record |
| Extrinsic Factual | Adds unverifiable but plausible info | Invents a specific version number not in docs |
| Extrinsic Non-Factual | Adds obviously false info | Claims a feature exists that was never built |
| Reasoning Error | Correct facts, wrong conclusion | "Disk is 90% full, therefore upgrade needed" (ignores tmp files) |
| Tool Hallucination | Fabricates tool results | Reports command output without running it |
| Self-Hallucination | False memory of own actions | "I already fixed that" when fix not in git |
Before ANY output that contains facts, claims, or recommendations, ask:
### Reality Check (5s)
1. SOURCE: Do I have direct evidence for this claim? (file read, tool output, live check)
2. VERIFICATION: Can I verify this right now with a tool call?
3. CONFIDENCE: Am I >80% confident? If yes, am I >95% confident? Flag if yes.
4. MEMORY: Is this from a file I actually read this session, or "feels right"?
5. CONTRADICTION: Does this contradict anything in my context or memory?
If ANY check fails: Escalate to Grounding Protocol (below).
⚠️ HALLUCINATION CHECK TRIGGERED
Type: [intrinsic/extrinsic/reasoning/tool/self]
Claim: [the specific claim being questioned]
Confidence: [self-assessed %]
Evidence: [what I have / what I lack]
If verifiable in <30s:
If not immediately verifiable:
If partially verifiable:
Add to memory/YYYY-MM-DD.md:
### Hallucination Correction — [Time]
- Claim: [what was wrong]
- Type: [taxonomy type]
- How caught: [which trigger fired]
- Correction: [what replaced it]
- Lesson: [pattern to watch for]
Never express certainty you don't have:
| Situation | Max Confidence Allowed | Required Action |
|---|---|---|
| Read file this turn | 95% | Cite line number |
| Read file earlier | 85% | Re-read if challenged |
| Memory from past session | 70% | Flag as "from memory" |
| Inferred from pattern | 60% | State inference chain |
| Heard in training data | 50% | Treat as unverified |
| Pure intuition | 30% | Do not state as fact |
Before reporting tool results:
Common tool hallucinations to watch for:
grep results without checking if match is realls/testWhen available (C1/C2/C3 coordination):
### Cross-Agent Verification
1. State claim to peer agent
2. Peer evaluates: [agree / disagree / cannot verify]
3. If disagree: both re-check sources
4. Consensus required for >90% confidence claims
5. Log disagreement in coordination channel
For single-agent operation: Use simulated peer review — state the claim, then critique it as if from an adversarial position.
Every 5-10 minutes of active work, or at natural breakpoints:
### Metacognitive Checkpoint
- [ ] What have I claimed since last checkpoint?
- [ ] Which claims were verified vs assumed?
- [ ] Did any tool call fail silently?
- [ ] Am I building on a potentially false foundation?
- [ ] Should I re-verify my starting assumptions?
When caught hallucinating:
When uncertain mid-task:
Add to AGENTS.md startup checks:
## Anti-Hallucination Protocol
Before any factual claim:
1. Run 5-Second Self-Check
2. If triggered, execute Grounding Protocol
3. Log corrections to memory
Add to every SKILL.md:
## Hallucination Risks
[List domain-specific hallucination patterns for this skill]
Add to TOOLS.md:
## Tool Verification Checklist
- [ ] Command executed successfully?
- [ ] Full output read and understood?
- [ ] Result reported accurately, not inferred?
Track in memory/hallucination-log.md:
## 2026-05-13 — Session Log
- Total claims made: [N]
- Verified claims: [N]
- Hallucinations caught: [N]
- Hallucinations missed (user caught): [N]
- Recovery time: [avg seconds]
Version 1.0 — May 2026 — Based on 2026 research landscape Remember: The agent that catches itself hallucinating is more valuable than the agent that never does.