Install
openclaw skills install hallucination-guardDetect and prevent AI agent hallucinations during task execution. Use when: (1) an agent claims to have created files, commits, or artifacts — verify them, (2) an agent produces data reports or numbers — audit against source, (3) running long multi-step tasks where fabrication risk is high, (4) you need cross-model verification of critical outputs. Provides 4-layer defense: L0 context hygiene, L1 claim-evidence protocol, L2 cross-model audit, L3 drift detection. NOT for: simple Q&A, opinion-based tasks, or conversations where factual accuracy is not critical.
openclaw skills install hallucination-guard4-layer defense against agent fabrication. Each layer is independent — use one or combine.
Highest risk conditions (apply more layers when these are present):
Reduce hallucination probability before it starts.
For long tasks (>10 steps):
read of the state file — never trust carried-over context for factsFor data-intensive tasks:
Cost: Zero. This is a workflow discipline, not an API call.
Every agent claim of physical action must include tool-verified evidence.
CLAIM: "I created/modified/committed X"
EVIDENCE: Tool output proving X exists and matches the claim
STATUS: VERIFIED (evidence confirms) or UNVERIFIED (no evidence yet)
| Claim | Verify With |
|---|---|
| Created file | ls -la {path} && head -20 {path} |
| Modified file | grep -n '{expected_content}' {path} |
| Git commit | git log --oneline -3 |
| Git push | git log --oneline origin/{branch} -3 |
| Ran tests | Show actual test output (pass AND fail counts) |
| API response | Show raw response body |
| Data analysis | Show wc -l of source + sample rows |
read/exec tool was calledCost: ~50 tokens per claim. One exec call per physical claim.
Spawn a second agent (different model) to independently verify claims.
See references/audit-prompt.md for the spawn template.
Key principles:
Cost: 1 subagent spawn. Use flash/gemini for simple checks (~$0.001). Reserve sonnet/opus for complex logic verification.
Monitor long-running agent tasks for hallucination patterns.
See references/drift-monitor.md for implementation.
Core signals:
Cost: Periodic check via sessions_history. No extra model calls unless alert triggers.
| Scenario | Recommended |
|---|---|
| Quick file creation | L1 only |
| Data report from CSV | L0 + L1 |
| Multi-step coding task | L0 + L1 + L2 |
| Autonomous long-running agent | All four layers |
| Routine conversation | None needed |
sessions_spawn: Add L2 audit as a final verification step