Install
openclaw skills install reikys-hallucination-guardExecution-based verification guardrail with 14 check items for AI agent output
openclaw skills install reikys-hallucination-guardSolving the biggest trust problem with AI agents. Not with "double-check that" prompts, but with 14 execution-based verification items.
AI agents carry the following trust issues:
| Problem Type | Example |
|---|---|
| Path hallucination | References non-existent files/directories as if they exist |
| Command hallucination | Describes uninstalled binaries as if they run normally |
| Library hallucination | Writes code that imports packages that don't exist on npm/pip |
| Numerical hallucination | States unsourced statistics/percentages as fact |
| Completeness hallucination | Reports "done" while leaving TODO/PLACEHOLDER behind |
| Consistency hallucination | Uses different names for the same concept within a document |
Call anytime during agent conversation with the following phrases:
hallucination check
hallucination check on <target file/path>
hallucination check --scope=fact,completeness
Add to the system prompt so the agent automatically runs this skill before completing a task:
Before completing any task, you must run all 14 checks from the hallucination-guard SKILL.md,
output the PASS/FAIL report, and fix any FAIL items.
When the full 14 items feel like too much, run only the essential 5:
# Run only H-1, H-2, H-9, H-10, H-12
hallucination check --quick
Purpose: Verify that files/directories referenced by the agent actually exist
Verification Commands:
# macOS / Linux
stat <path>
ls -la <path>
# Existence check only (exit code based)
[ -e "<path>" ] && echo "PASS: $path exists" || echo "FAIL: $path not found"
# Batch check for multiple paths
for p in path1 path2 path3; do
[ -e "$p" ] && echo "✅ $p" || echo "❌ $p"
done
Windows (PowerShell):
Test-Path "C:\path\to\check"
Pass Criteria: All referenced paths confirmed to exist via stat/Test-Path → PASS
Purpose: Verify that CLI commands used in code or documentation are actually installed
Verification Commands:
# macOS / Linux
which <command>
command -v <command>
# Example: batch check for multiple binaries
for cmd in git node python3 docker jq; do
command -v "$cmd" &>/dev/null \
&& echo "✅ $cmd: $(which $cmd)" \
|| echo "❌ $cmd: not installed"
done
Windows (PowerShell):
Get-Command <command> -ErrorAction SilentlyContinue
Pass Criteria: All CLI commands appearing in documents/code confirmed via command -v → PASS
Purpose: Verify that links/API endpoints embedded in documentation are actually accessible
Verification Commands:
# Check HTTP status code (5-second timeout)
curl -sI --max-time 5 <url> | head -1
# Batch check script
urls=(
"https://example.com/api"
"https://docs.example.com"
)
for url in "${urls[@]}"; do
status=$(curl -sI --max-time 5 "$url" | head -1 | awk '{print $2}')
if [[ "$status" =~ ^[23] ]]; then
echo "✅ $url → HTTP $status"
else
echo "❌ $url → HTTP $status (or unreachable)"
fi
done
Note: False positive/negative possible depending on network environment. Manual verification recommended for internal network URLs.
Pass Criteria: External reference URLs respond with 2xx/3xx → PASS (optional execution)
Purpose: Verify that code generated by the agent is actually parseable
Verification Commands:
Python:
python3 -c "
import ast, sys
with open('target.py') as f:
src = f.read()
try:
ast.parse(src)
print('✅ Python syntax valid')
except SyntaxError as e:
print(f'❌ SyntaxError: {e}')
sys.exit(1)
"
JavaScript/TypeScript:
# Node.js
node --check target.js
# TypeScript
npx tsc --noEmit target.ts
JSON:
jq . target.json > /dev/null && echo "✅ JSON valid" || echo "❌ JSON parse failed"
YAML:
python3 -c "import yaml; yaml.safe_load(open('target.yaml'))" \
&& echo "✅ YAML valid" || echo "❌ YAML parse failed"
Shell:
bash -n target.sh && echo "✅ Shell syntax valid" || echo "❌ Shell syntax error"
Pass Criteria: All generated code files pass their respective language parsers → PASS
Purpose: Verify that statistics/numbers mentioned by the agent are substantiated
Verification Method:
Checklist:
□ Is the source (URL, paper, official docs) specified for the number?
□ Can the source be cross-verified with 2+ references?
□ Is the data current? (check date)
□ Is uncertainty appropriately expressed? ("approximately X%", "roughly Nx")
Auto-detection Pattern (grep):
# Detect unsourced number patterns
grep -En "[0-9]+%" <file> | grep -v "http\|source\|ref\|reference"
grep -En "[0-9]+(x|times)" <file> | grep -v "http\|source"
Pass Criteria: All numbers have cited sources or are labeled "needs verification:" → PASS
Purpose: Verify there are no conflicting claims within the same document
Verification Method:
# Manual check for negation/affirmation pairs
grep -n "cannot\|impossible\|prohibited\|not available\|not supported" <file>
grep -n "possible\|supported\|available\|can be\|is able to" <file>
# Agent self-verification instruction
"""
Read the document below and list all pairs of contradicting claims.
If none exist, respond with "No self-contradiction found."
[document content]
"""
Pass Criteria: 0 conflicting claim pairs → PASS
Purpose: 1:1 mapping to verify all initially promised deliverables were actually generated
Verification Method:
# Extract deliverable list (e.g., ## Deliverables section)
grep -A 20 "deliverable\|output\|result" PLAN.md
# Verify actual file existence
promised_files=(
"src/main.py"
"README.md"
"tests/test_main.py"
)
for f in "${promised_files[@]}"; do
[ -f "$f" ] && echo "✅ $f" || echo "❌ $f missing (promise not fulfilled)"
done
Pass Criteria: All deliverables specified in the plan actually exist → PASS
Purpose: Verify that the same concept is called by the same name throughout the document
Auto-detection Example:
# Detect synonym mixing (customize as needed)
echo "=== 'user' related terms ==="
grep -oin "user\|customer\|client\|end-user\|end user" <file> | sort | uniq -c | sort -rn
echo "=== 'error' related terms ==="
grep -oin "error\|failure\|fault\|exception\|bug" <file> | sort | uniq -c | sort -rn
Pass Criteria: Core terms are not mixed with 2+ different names → PASS
(Intentional synonym usage must be stated in comments/definitions)
Purpose: Verify that no incomplete markers remain
Verification Commands:
# Basic search
grep -rn "TODO\|FIXME\|HACK\|XXX\|TEMP\|BUG" <path>
# Count
count=$(grep -rn "TODO\|FIXME\|HACK\|XXX" <path> | wc -l)
if [ "$count" -eq 0 ]; then
echo "✅ H-9 PASS: No remaining markers"
else
echo "❌ H-9 FAIL: $count incomplete markers found"
grep -rn "TODO\|FIXME\|HACK\|XXX" <path>
fi
Windows (PowerShell):
Select-String -Path ".\*" -Pattern "TODO|FIXME|HACK|XXX" -Recurse
Pass Criteria: 0 TODO/FIXME/HACK/XXX occurrences → PASS
Purpose: Verify no placeholders remain that weren't filled with actual values
Verification Commands:
# Search for placeholder patterns
grep -rn \
"PLACEHOLDER\|CHANGEME\|TBD\|INSERT_HERE\|<YOUR_\|YOUR_API_KEY\|example\.com\|foo@bar\|REPLACE_ME\|FILL_IN" \
<path>
# Count
count=$(grep -rn "PLACEHOLDER\|CHANGEME\|TBD\|INSERT_HERE\|YOUR_API_KEY" <path> | wc -l)
[ "$count" -eq 0 ] \
&& echo "✅ H-10 PASS: No placeholders" \
|| echo "❌ H-10 FAIL: $count found"
Pass Criteria: 0 placeholder pattern occurrences → PASS
Purpose: Verify that files promised in specifications (README, PLAN, conversation) actually exist
Verification Method:
#!/bin/bash
# Extract file list from spec file (path pattern matching)
spec_file="PLAN.md" # or README.md
echo "=== Deliverable Existence Check ==="
missing=0
# Extract paths from markdown code blocks
grep -oE '`[^`]+\.(py|js|ts|md|json|yaml|sh)`' "$spec_file" | tr -d '`' | while read f; do
if [ -e "$f" ]; then
echo "✅ $f"
else
echo "❌ $f (not found)"
missing=$((missing + 1))
fi
done
Pass Criteria: All paths mentioned in spec files actually exist → PASS
Purpose: Verify that no non-existent packages are imported/required
npm Package Verification:
# Verify all dependencies in package.json exist
node -e "
const pkg = require('./package.json');
const deps = {...(pkg.dependencies||{}), ...(pkg.devDependencies||{})};
const names = Object.keys(deps);
console.log('Packages to check:', names.length);
"
# Actual npm registry lookup
npm_check() {
local pkg=$1
result=$(curl -sI "https://registry.npmjs.org/$pkg" | head -1)
echo "$result" | grep -q "200" \
&& echo "✅ npm: $pkg exists" \
|| echo "❌ npm: $pkg not found (suspected hallucination)"
}
# Usage example
npm_check "some-package-name"
pip Package Verification:
pip_check() {
local pkg=$1
result=$(curl -sI "https://pypi.org/pypi/$pkg/json" | head -1)
echo "$result" | grep -q "200" \
&& echo "✅ PyPI: $pkg exists" \
|| echo "❌ PyPI: $pkg not found (suspected hallucination)"
}
# Registry lookup preferred since dry-run install is not available
pip_check "some-library"
Extracting and Batch-Verifying Imports from Code:
# Extract Python imports
python3 -c "
import ast, sys
tree = ast.parse(open('target.py').read())
imports = set()
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
imports.add(alias.name.split('.')[0])
elif isinstance(node, ast.ImportFrom):
if node.module:
imports.add(node.module.split('.')[0])
# Exclude standard library
import sys
stdlib = set(sys.stdlib_module_names) if hasattr(sys, 'stdlib_module_names') else set()
third_party = imports - stdlib
print('Third-party imports:', sorted(third_party))
"
Pass Criteria: All import/require packages actually exist in their registries → PASS
Purpose: Verify no numbers/statistics/ratios are presented definitively without sources
Auto-detection Patterns:
# Detect numbers without sources
echo "=== Unsourced number candidates ==="
# Percentage patterns
grep -En "[0-9]+(\.[0-9]+)?%" <file> | grep -iv "http\|source\|ref\|reference\|needs verification"
# Multiplier/ratio patterns
grep -En "[0-9]+(x|times)" <file> | grep -iv "http\|source\|reference\|needs verification"
# Large number patterns (millions, billions)
grep -En "[0-9]+(million|billion|thousand)" <file> | grep -iv "http\|source\|reference"
Manual Check Checklist:
□ Do all statistics have "[source: URL or paper name]"?
□ Are uncertainty expressions used appropriately? ("approximately", "roughly", "estimated")
□ Are there statistics without dates? (unclear recency)
□ If internal data, is "based on internal data" stated?
Pass Criteria: All numbers have cited sources or "needs verification:" label → PASS
Purpose: Verify uncertain matters aren't stated definitively with "always", "never", "guaranteed", etc.
Auto-detection Commands:
echo "=== Overconfident expression detection ==="
grep -En \
"never\|always\|definitely\|certainly\|guaranteed\|without doubt\|100%\|absolutely\|undoubtedly\|impossible\|must be true" \
<file>
Allowed vs. Not Allowed:
❌ Not allowed: "This method is guaranteed to work"
✅ Allowed: "This method works in most cases (needs verification: edge case X)"
❌ Not allowed: "Errors will never occur"
✅ Allowed: "Error likelihood is low in typical usage scenarios"
Pass Criteria: 0 definitive expressions, or if present, accompanied by supporting evidence → PASS
╔══════════════════════════════════════════════════════════╗
║ 🛡️ HALLUCINATION GUARD REPORT ║
║ Executed: 2026-03-26 15:21 KST ║
╠══════════════════════════════════════════════════════════╣
║ Target: /path/to/target Scope: ALL ║
╠══════╦══════════╦════════╦════════════════════════════╣
║ ID ║ Category ║ Result ║ Details ║
╠══════╬══════════╬════════╬════════════════════════════╣
║ H-1 ║ Fact ║ PASS ║ All 4 referenced paths exist ║
║ H-2 ║ Fact ║ PASS ║ 3 CLIs confirmed installed ║
║ H-3 ║ Fact ║ SKIP ║ Network check not selected ║
║ H-4 ║ Fact ║ PASS ║ Python/JS syntax valid ║
║ H-5 ║ Fact ║ FAIL ║ 3 numbers missing sources ║
╠══════╬══════════╬════════╬════════════════════════════╣
║ H-6 ║ Consist. ║ PASS ║ No self-contradiction ║
║ H-7 ║ Consist. ║ FAIL ║ README.md promised but absent ║
║ H-8 ║ Consist. ║ PASS ║ Core term consistency OK ║
╠══════╬══════════╬════════╬════════════════════════════╣
║ H-9 ║ Complete ║ PASS ║ TODO/FIXME: 0 found ║
║ H-10 ║ Complete ║ FAIL ║ YOUR_API_KEY: 2 found ║
║ H-11 ║ Complete ║ PASS ║ Deliverables: 5/5 exist ║
╠══════╬══════════╬════════╬════════════════════════════╣
║ H-12 ║ Halluc. ║ PASS ║ All packages registry-verified║
║ H-13 ║ Halluc. ║ FAIL ║ 2 unsourced numbers found ║
║ H-14 ║ Halluc. ║ PASS ║ Overconfident expressions: 0 ║
╠══════╩══════════╩════════╩════════════════════════════╣
║ Total: PASS 9 / FAIL 4 / SKIP 1 ║
║ Score: 9/13 = 69.2% (recommended threshold: 85%) ║
║ Verdict: ❌ FAIL — Fix required, then re-run ║
╚══════════════════════════════════════════════════════════╝
[FAIL Item Fix Guide]
H-5: Add source URLs to numbers on lines 42, 67, 89
H-7: Create README.md or remove promise from PLAN.md
H-10: Check .env.example — replace YOUR_API_KEY with actual key or comment out
H-13: Add "[needs verification]" label to statistics on lines 15, 33
🛡️ HG: PASS 9/13 (H-5,H-7,H-10,H-13 FAIL) → Fix required
{
"timestamp": "2026-03-26T15:21:00+09:00",
"target": "/path/to/target",
"score": 0.692,
"threshold": 0.85,
"verdict": "FAIL",
"results": [
{"id": "H-1", "category": "fact", "status": "PASS", "detail": "All 4 referenced paths exist"},
{"id": "H-5", "category": "fact", "status": "FAIL", "detail": "3 numbers missing sources", "lines": [42, 67, 89]},
...
]
}
OS="macos"
STAT_CMD="stat"
WHICH_CMD="which"
GREP_CMD="grep"
# If GNU grep not available, brew install grep recommended
OS="linux"
STAT_CMD="stat"
WHICH_CMD="which"
GREP_CMD="grep"
# Most GNU tools installed by default
# stat → Test-Path / Get-Item
# which → Get-Command
# grep → Select-String
# curl → Invoke-WebRequest
# Wrapper function definitions
function hg-pathcheck { param($p) Test-Path $p }
function hg-cmdcheck { param($c) Get-Command $c -EA SilentlyContinue }
function hg-grep { param($p, $f) Select-String -Pattern $p -Path $f }
# GitHub Actions example
- name: Hallucination Guard
run: |
bash scripts/hallucination-guard.sh \
--scope=all \
--format=json \
--output=hg-report.json
- name: Check HG Result
run: |
score=$(jq '.score' hg-report.json)
python3 -c "import sys; sys.exit(0 if float('$score') >= 0.85 else 1)"
[Auto-trigger after coding-agent completion]
1. coding-agent completes code generation
2. hallucination-guard auto-runs (H-1, H-2, H-4, H-9, H-10, H-12 prioritized)
3. FAIL items → Send fix instructions to coding-agent
4. Re-run → Confirm PASS before final report
[Add HG to branch completion checklist]
□ Tests passing
□ hallucination-guard 85%+ PASS
□ PR created
[Verify after documentation writing]
docs-writer complete → hallucination-guard --scope=fact,consistency,completeness
[Dual verification pipeline]
Code generation
→ hallucination-guard (automated/execution-based verification)
→ adversarial-code-review (logic/security verification)
→ Final approval
Create .hallucination-guard.yaml at the project root:
# .hallucination-guard.yaml
version: "1.0"
custom_checks:
- id: "H-C1"
name: "Internal API endpoint verification"
category: "fact"
command: |
curl -sI --max-time 5 https://internal-api.company.com/health | head -1
pass_pattern: "200"
severity: "error" # error | warning | info
- id: "H-C2"
name: "Company terminology compliance"
category: "consistency"
command: |
grep -n "end-user\|enduser\|end user" {target} | grep -v "# allowed"
fail_pattern: "." # Match = FAIL
severity: "warning"
- id: "H-C3"
name: "License header presence"
category: "completeness"
command: |
head -3 {target} | grep -i "copyright\|license\|MIT\|Apache"
pass_pattern: "."
severity: "error"
# Threshold customization
thresholds:
pass_rate: 0.90 # Require 90% PASS
error_tolerance: 0 # Allow 0 error-level FAILs
warning_tolerance: 2 # Allow up to 2 warning-level FAILs
# Exclude patterns
exclude:
paths:
- "node_modules/**"
- "*.min.js"
- ".git/**"
checks:
- "H-3" # Disable URL check (internal network environment)
Situation: coding-agent generated a Python script, and you want to verify it actually works
[Conversation]
User: Implement a file upload feature
Agent: [Code generation complete] Created src/uploader.py...
User: hallucination check on src/uploader.py
Agent: 🛡️ Running H-1...
✅ src/uploader.py exists
🛡️ Running H-2...
✅ python3 confirmed installed
❌ boto3: not installed (import boto3 used in code)
🛡️ Running H-4...
✅ Python syntax valid
🛡️ Running H-9...
❌ TODO: S3 bucket name needs to be configured (line 23)
🛡️ Running H-12...
✅ boto3: confirmed on PyPI (just not installed locally)
[Result] PASS 4/5 — Need to install boto3 and handle TODO
Recommend: pip install boto3, then retry
Situation: Agent wrote team API documentation, needs pre-deployment verification
[Execution]
hallucination check on docs/api-reference.md --scope=fact,consistency
[Result]
H-3: ❌ https://api.example.com/v2/users → HTTP 404
(Endpoint in documentation doesn't actually exist)
H-5: ❌ "Average response time 23ms" — no source
❌ "99.9% uptime guaranteed" — no source/evidence
H-8: ⚠️ Mixed usage of "user" / "client" / "end-user" detected
→ Unification recommended
[Fix Instructions]
1. Change /v2/users → /v1/users or create the endpoint
2. Add "[internal measurement, 2026-03]" to response time figure
3. Replace uptime claim with SLA document link
4. Unify to "user" throughout
Situation: Auto-run in GitHub Actions before PR merge
# .github/workflows/hallucination-guard.yml
name: Hallucination Guard
on:
pull_request:
types: [opened, synchronize]
jobs:
hg-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Hallucination Guard
run: |
# H-9, H-10 (completeness) checks
echo "=== H-9: TODO/FIXME check ==="
count=$(grep -rn "TODO\|FIXME\|HACK\|XXX" src/ | wc -l)
[ "$count" -gt 0 ] && { echo "FAIL: $count found"; exit 1; } || echo "PASS"
echo "=== H-10: Placeholder check ==="
count=$(grep -rn "PLACEHOLDER\|YOUR_API_KEY\|CHANGEME\|TBD" src/ | wc -l)
[ "$count" -gt 0 ] && { echo "FAIL: $count found"; exit 1; } || echo "PASS"
echo "=== H-4: Syntax validity ==="
find src -name "*.py" -exec python3 -m py_compile {} \; && echo "PASS"
echo "✅ Hallucination Guard complete"
- name: Comment PR
if: failure()
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '❌ Hallucination Guard FAIL — Please fix TODO/placeholder/syntax errors.'
})
Agents that have read this skill follow these principles:
fact(H-1consistency(H-6completeness(H-9hallucination(H-12.hallucination-guard.yaml)https://registry.npmjs.org/<package-name> (public API)https://pypi.org/pypi/<package-name>/json (public API)hallucination-guard v1.0.0 — by reikys
"Proving trust through execution, not prompts"