Hallucination Guard

v1.0.1

Execution-based verification guardrail with 14 check items for AI agent output

0· 168· 2 versions· 0 current· 0 all-time· Updated 7h ago· MIT-0

Install

openclaw skills install reikys-hallucination-guard

🛡️ hallucination-guard

Solving the biggest trust problem with AI agents. Not with "double-check that" prompts, but with 14 execution-based verification items.


🎯 Problem Definition

AI agents carry the following trust issues:

Problem TypeExample
Path hallucinationReferences non-existent files/directories as if they exist
Command hallucinationDescribes uninstalled binaries as if they run normally
Library hallucinationWrites code that imports packages that don't exist on npm/pip
Numerical hallucinationStates unsourced statistics/percentages as fact
Completeness hallucinationReports "done" while leaving TODO/PLACEHOLDER behind
Consistency hallucinationUses different names for the same concept within a document

Limitations of Existing Solutions

  • truth-check, verification-before-completion: Just tells the agent "check again" — still hallucination-based
  • hallucination-guard: 14 concrete items × executable commands × structured PASS/FAIL report

⚡ Quick Start

1. Trigger Method

Call anytime during agent conversation with the following phrases:

hallucination check
hallucination check on <target file/path>
hallucination check --scope=fact,completeness

2. Auto-Execution Method

Add to the system prompt so the agent automatically runs this skill before completing a task:

Before completing any task, you must run all 14 checks from the hallucination-guard SKILL.md,
output the PASS/FAIL report, and fix any FAIL items.

3. Quick Check (3-Minute Version)

When the full 14 items feel like too much, run only the essential 5:

# Run only H-1, H-2, H-9, H-10, H-12
hallucination check --quick

📋 14 Check Items in Detail

🔵 Fact Verification (H-1 ~ H-5)


H-1: File Path Existence Verification

Purpose: Verify that files/directories referenced by the agent actually exist

Verification Commands:

# macOS / Linux
stat <path>
ls -la <path>

# Existence check only (exit code based)
[ -e "<path>" ] && echo "PASS: $path exists" || echo "FAIL: $path not found"

# Batch check for multiple paths
for p in path1 path2 path3; do
  [ -e "$p" ] && echo "✅ $p" || echo "❌ $p"
done

Windows (PowerShell):

Test-Path "C:\path\to\check"

Pass Criteria: All referenced paths confirmed to exist via stat/Test-Path → PASS


H-2: Command/Binary Existence Verification

Purpose: Verify that CLI commands used in code or documentation are actually installed

Verification Commands:

# macOS / Linux
which <command>
command -v <command>

# Example: batch check for multiple binaries
for cmd in git node python3 docker jq; do
  command -v "$cmd" &>/dev/null \
    && echo "✅ $cmd: $(which $cmd)" \
    || echo "❌ $cmd: not installed"
done

Windows (PowerShell):

Get-Command <command> -ErrorAction SilentlyContinue

Pass Criteria: All CLI commands appearing in documents/code confirmed via command -v → PASS


H-3: URL Validity Check (Optional)

Purpose: Verify that links/API endpoints embedded in documentation are actually accessible

Verification Commands:

# Check HTTP status code (5-second timeout)
curl -sI --max-time 5 <url> | head -1

# Batch check script
urls=(
  "https://example.com/api"
  "https://docs.example.com"
)
for url in "${urls[@]}"; do
  status=$(curl -sI --max-time 5 "$url" | head -1 | awk '{print $2}')
  if [[ "$status" =~ ^[23] ]]; then
    echo "✅ $url → HTTP $status"
  else
    echo "❌ $url → HTTP $status (or unreachable)"
  fi
done

Note: False positive/negative possible depending on network environment. Manual verification recommended for internal network URLs.

Pass Criteria: External reference URLs respond with 2xx/3xx → PASS (optional execution)


H-4: Code Syntax Validity Check

Purpose: Verify that code generated by the agent is actually parseable

Verification Commands:

Python:

python3 -c "
import ast, sys
with open('target.py') as f:
    src = f.read()
try:
    ast.parse(src)
    print('✅ Python syntax valid')
except SyntaxError as e:
    print(f'❌ SyntaxError: {e}')
    sys.exit(1)
"

JavaScript/TypeScript:

# Node.js
node --check target.js

# TypeScript
npx tsc --noEmit target.ts

JSON:

jq . target.json > /dev/null && echo "✅ JSON valid" || echo "❌ JSON parse failed"

YAML:

python3 -c "import yaml; yaml.safe_load(open('target.yaml'))" \
  && echo "✅ YAML valid" || echo "❌ YAML parse failed"

Shell:

bash -n target.sh && echo "✅ Shell syntax valid" || echo "❌ Shell syntax error"

Pass Criteria: All generated code files pass their respective language parsers → PASS


H-5: Numerical Data Cross-Verification

Purpose: Verify that statistics/numbers mentioned by the agent are substantiated

Verification Method:

Checklist:
□ Is the source (URL, paper, official docs) specified for the number?
□ Can the source be cross-verified with 2+ references?
□ Is the data current? (check date)
□ Is uncertainty appropriately expressed? ("approximately X%", "roughly Nx")

Auto-detection Pattern (grep):

# Detect unsourced number patterns
grep -En "[0-9]+%" <file> | grep -v "http\|source\|ref\|reference"
grep -En "[0-9]+(x|times)" <file> | grep -v "http\|source"

Pass Criteria: All numbers have cited sources or are labeled "needs verification:" → PASS


🟡 Consistency (H-6 ~ H-8)


H-6: No Self-Contradiction

Purpose: Verify there are no conflicting claims within the same document

Verification Method:

# Manual check for negation/affirmation pairs
grep -n "cannot\|impossible\|prohibited\|not available\|not supported" <file>
grep -n "possible\|supported\|available\|can be\|is able to" <file>

# Agent self-verification instruction
"""
Read the document below and list all pairs of contradicting claims.
If none exist, respond with "No self-contradiction found."
[document content]
"""

Pass Criteria: 0 conflicting claim pairs → PASS


H-7: Plan-Result Alignment

Purpose: 1:1 mapping to verify all initially promised deliverables were actually generated

Verification Method:

# Extract deliverable list (e.g., ## Deliverables section)
grep -A 20 "deliverable\|output\|result" PLAN.md

# Verify actual file existence
promised_files=(
  "src/main.py"
  "README.md"
  "tests/test_main.py"
)
for f in "${promised_files[@]}"; do
  [ -f "$f" ] && echo "✅ $f" || echo "❌ $f missing (promise not fulfilled)"
done

Pass Criteria: All deliverables specified in the plan actually exist → PASS


H-8: Terminology Consistency

Purpose: Verify that the same concept is called by the same name throughout the document

Auto-detection Example:

# Detect synonym mixing (customize as needed)
echo "=== 'user' related terms ==="
grep -oin "user\|customer\|client\|end-user\|end user" <file> | sort | uniq -c | sort -rn

echo "=== 'error' related terms ==="
grep -oin "error\|failure\|fault\|exception\|bug" <file> | sort | uniq -c | sort -rn

Pass Criteria: Core terms are not mixed with 2+ different names → PASS
(Intentional synonym usage must be stated in comments/definitions)


🟢 Completeness (H-9 ~ H-11)


H-9: No Remaining TODO/FIXME

Purpose: Verify that no incomplete markers remain

Verification Commands:

# Basic search
grep -rn "TODO\|FIXME\|HACK\|XXX\|TEMP\|BUG" <path>

# Count
count=$(grep -rn "TODO\|FIXME\|HACK\|XXX" <path> | wc -l)
if [ "$count" -eq 0 ]; then
  echo "✅ H-9 PASS: No remaining markers"
else
  echo "❌ H-9 FAIL: $count incomplete markers found"
  grep -rn "TODO\|FIXME\|HACK\|XXX" <path>
fi

Windows (PowerShell):

Select-String -Path ".\*" -Pattern "TODO|FIXME|HACK|XXX" -Recurse

Pass Criteria: 0 TODO/FIXME/HACK/XXX occurrences → PASS


H-10: No Placeholders

Purpose: Verify no placeholders remain that weren't filled with actual values

Verification Commands:

# Search for placeholder patterns
grep -rn \
  "PLACEHOLDER\|CHANGEME\|TBD\|INSERT_HERE\|<YOUR_\|YOUR_API_KEY\|example\.com\|foo@bar\|REPLACE_ME\|FILL_IN" \
  <path>

# Count
count=$(grep -rn "PLACEHOLDER\|CHANGEME\|TBD\|INSERT_HERE\|YOUR_API_KEY" <path> | wc -l)
[ "$count" -eq 0 ] \
  && echo "✅ H-10 PASS: No placeholders" \
  || echo "❌ H-10 FAIL: $count found"

Pass Criteria: 0 placeholder pattern occurrences → PASS


H-11: All Deliverables Exist

Purpose: Verify that files promised in specifications (README, PLAN, conversation) actually exist

Verification Method:

#!/bin/bash
# Extract file list from spec file (path pattern matching)
spec_file="PLAN.md"  # or README.md

echo "=== Deliverable Existence Check ==="
missing=0
# Extract paths from markdown code blocks
grep -oE '`[^`]+\.(py|js|ts|md|json|yaml|sh)`' "$spec_file" | tr -d '`' | while read f; do
  if [ -e "$f" ]; then
    echo "✅ $f"
  else
    echo "❌ $f (not found)"
    missing=$((missing + 1))
  fi
done

Pass Criteria: All paths mentioned in spec files actually exist → PASS


🔴 Hallucination Patterns (H-12 ~ H-14)


H-12: No Fictional Library/API References

Purpose: Verify that no non-existent packages are imported/required

npm Package Verification:

# Verify all dependencies in package.json exist
node -e "
const pkg = require('./package.json');
const deps = {...(pkg.dependencies||{}), ...(pkg.devDependencies||{})};
const names = Object.keys(deps);
console.log('Packages to check:', names.length);
" 

# Actual npm registry lookup
npm_check() {
  local pkg=$1
  result=$(curl -sI "https://registry.npmjs.org/$pkg" | head -1)
  echo "$result" | grep -q "200" \
    && echo "✅ npm: $pkg exists" \
    || echo "❌ npm: $pkg not found (suspected hallucination)"
}

# Usage example
npm_check "some-package-name"

pip Package Verification:

pip_check() {
  local pkg=$1
  result=$(curl -sI "https://pypi.org/pypi/$pkg/json" | head -1)
  echo "$result" | grep -q "200" \
    && echo "✅ PyPI: $pkg exists" \
    || echo "❌ PyPI: $pkg not found (suspected hallucination)"
}

# Registry lookup preferred since dry-run install is not available
pip_check "some-library"

Extracting and Batch-Verifying Imports from Code:

# Extract Python imports
python3 -c "
import ast, sys
tree = ast.parse(open('target.py').read())
imports = set()
for node in ast.walk(tree):
    if isinstance(node, ast.Import):
        for alias in node.names:
            imports.add(alias.name.split('.')[0])
    elif isinstance(node, ast.ImportFrom):
        if node.module:
            imports.add(node.module.split('.')[0])
# Exclude standard library
import sys
stdlib = set(sys.stdlib_module_names) if hasattr(sys, 'stdlib_module_names') else set()
third_party = imports - stdlib
print('Third-party imports:', sorted(third_party))
"

Pass Criteria: All import/require packages actually exist in their registries → PASS


H-13: No Unsourced Statistics

Purpose: Verify no numbers/statistics/ratios are presented definitively without sources

Auto-detection Patterns:

# Detect numbers without sources
echo "=== Unsourced number candidates ==="
# Percentage patterns
grep -En "[0-9]+(\.[0-9]+)?%" <file> | grep -iv "http\|source\|ref\|reference\|needs verification"

# Multiplier/ratio patterns
grep -En "[0-9]+(x|times)" <file> | grep -iv "http\|source\|reference\|needs verification"

# Large number patterns (millions, billions)
grep -En "[0-9]+(million|billion|thousand)" <file> | grep -iv "http\|source\|reference"

Manual Check Checklist:

□ Do all statistics have "[source: URL or paper name]"?
□ Are uncertainty expressions used appropriately? ("approximately", "roughly", "estimated")
□ Are there statistics without dates? (unclear recency)
□ If internal data, is "based on internal data" stated?

Pass Criteria: All numbers have cited sources or "needs verification:" label → PASS


H-14: No Overconfident Expressions

Purpose: Verify uncertain matters aren't stated definitively with "always", "never", "guaranteed", etc.

Auto-detection Commands:

echo "=== Overconfident expression detection ==="
grep -En \
  "never\|always\|definitely\|certainly\|guaranteed\|without doubt\|100%\|absolutely\|undoubtedly\|impossible\|must be true" \
  <file>

Allowed vs. Not Allowed:

❌ Not allowed: "This method is guaranteed to work"
✅ Allowed: "This method works in most cases (needs verification: edge case X)"

❌ Not allowed: "Errors will never occur"
✅ Allowed: "Error likelihood is low in typical usage scenarios"

Pass Criteria: 0 definitive expressions, or if present, accompanied by supporting evidence → PASS


📊 Report Output Format

Standard PASS/FAIL Table

╔══════════════════════════════════════════════════════════╗
║            🛡️ HALLUCINATION GUARD REPORT                 ║
║            Executed: 2026-03-26 15:21 KST               ║
╠══════════════════════════════════════════════════════════╣
║  Target: /path/to/target  Scope: ALL                    ║
╠══════╦══════════╦════════╦════════════════════════════╣
║  ID  ║ Category ║ Result ║  Details                     ║
╠══════╬══════════╬════════╬════════════════════════════╣
║ H-1  ║  Fact    ║  PASS  ║ All 4 referenced paths exist ║
║ H-2  ║  Fact    ║  PASS  ║ 3 CLIs confirmed installed   ║
║ H-3  ║  Fact    ║  SKIP  ║ Network check not selected   ║
║ H-4  ║  Fact    ║  PASS  ║ Python/JS syntax valid       ║
║ H-5  ║  Fact    ║  FAIL  ║ 3 numbers missing sources    ║
╠══════╬══════════╬════════╬════════════════════════════╣
║ H-6  ║ Consist. ║  PASS  ║ No self-contradiction        ║
║ H-7  ║ Consist. ║  FAIL  ║ README.md promised but absent ║
║ H-8  ║ Consist. ║  PASS  ║ Core term consistency OK     ║
╠══════╬══════════╬════════╬════════════════════════════╣
║ H-9  ║ Complete ║  PASS  ║ TODO/FIXME: 0 found          ║
║ H-10 ║ Complete ║  FAIL  ║ YOUR_API_KEY: 2 found        ║
║ H-11 ║ Complete ║  PASS  ║ Deliverables: 5/5 exist      ║
╠══════╬══════════╬════════╬════════════════════════════╣
║ H-12 ║ Halluc.  ║  PASS  ║ All packages registry-verified║
║ H-13 ║ Halluc.  ║  FAIL  ║ 2 unsourced numbers found    ║
║ H-14 ║ Halluc.  ║  PASS  ║ Overconfident expressions: 0 ║
╠══════╩══════════╩════════╩════════════════════════════╣
║  Total: PASS 9 / FAIL 4 / SKIP 1                       ║
║  Score: 9/13 = 69.2% (recommended threshold: 85%)      ║
║  Verdict: ❌ FAIL — Fix required, then re-run           ║
╚══════════════════════════════════════════════════════════╝

[FAIL Item Fix Guide]
H-5:  Add source URLs to numbers on lines 42, 67, 89
H-7:  Create README.md or remove promise from PLAN.md
H-10: Check .env.example — replace YOUR_API_KEY with actual key or comment out
H-13: Add "[needs verification]" label to statistics on lines 15, 33

Brief Mode (--brief)

🛡️ HG: PASS 9/13 (H-5,H-7,H-10,H-13 FAIL) → Fix required

JSON Mode (--format=json)

{
  "timestamp": "2026-03-26T15:21:00+09:00",
  "target": "/path/to/target",
  "score": 0.692,
  "threshold": 0.85,
  "verdict": "FAIL",
  "results": [
    {"id": "H-1", "category": "fact", "status": "PASS", "detail": "All 4 referenced paths exist"},
    {"id": "H-5", "category": "fact", "status": "FAIL", "detail": "3 numbers missing sources", "lines": [42, 67, 89]},
    ...
  ]
}

🌍 Platform-Specific Branches

macOS

OS="macos"
STAT_CMD="stat"
WHICH_CMD="which"
GREP_CMD="grep"
# If GNU grep not available, brew install grep recommended

Linux

OS="linux"
STAT_CMD="stat"
WHICH_CMD="which"
GREP_CMD="grep"
# Most GNU tools installed by default

Windows (PowerShell)

# stat → Test-Path / Get-Item
# which → Get-Command
# grep → Select-String
# curl → Invoke-WebRequest

# Wrapper function definitions
function hg-pathcheck { param($p) Test-Path $p }
function hg-cmdcheck { param($c) Get-Command $c -EA SilentlyContinue }
function hg-grep { param($p, $f) Select-String -Pattern $p -Path $f }

Docker/CI Environment

# GitHub Actions example
- name: Hallucination Guard
  run: |
    bash scripts/hallucination-guard.sh \
      --scope=all \
      --format=json \
      --output=hg-report.json
  
- name: Check HG Result
  run: |
    score=$(jq '.score' hg-report.json)
    python3 -c "import sys; sys.exit(0 if float('$score') >= 0.85 else 1)"

🔌 Integration with Other Skills

Integration with coding-agent

[Auto-trigger after coding-agent completion]
1. coding-agent completes code generation
2. hallucination-guard auto-runs (H-1, H-2, H-4, H-9, H-10, H-12 prioritized)
3. FAIL items → Send fix instructions to coding-agent
4. Re-run → Confirm PASS before final report

Integration with finishing-a-development-branch

[Add HG to branch completion checklist]
□ Tests passing
□ hallucination-guard 85%+ PASS
□ PR created

Integration with docs-writer

[Verify after documentation writing]
docs-writer complete → hallucination-guard --scope=fact,consistency,completeness

Integration with adversarial-code-review

[Dual verification pipeline]
Code generation
  → hallucination-guard (automated/execution-based verification)
  → adversarial-code-review (logic/security verification)
  → Final approval

➕ How to Add Custom Items

Create .hallucination-guard.yaml at the project root:

# .hallucination-guard.yaml
version: "1.0"
custom_checks:
  - id: "H-C1"
    name: "Internal API endpoint verification"
    category: "fact"
    command: |
      curl -sI --max-time 5 https://internal-api.company.com/health | head -1
    pass_pattern: "200"
    severity: "error"  # error | warning | info
    
  - id: "H-C2"
    name: "Company terminology compliance"
    category: "consistency"
    command: |
      grep -n "end-user\|enduser\|end user" {target} | grep -v "# allowed"
    fail_pattern: "."  # Match = FAIL
    severity: "warning"
    
  - id: "H-C3"
    name: "License header presence"
    category: "completeness"
    command: |
      head -3 {target} | grep -i "copyright\|license\|MIT\|Apache"
    pass_pattern: "."
    severity: "error"

# Threshold customization
thresholds:
  pass_rate: 0.90      # Require 90% PASS
  error_tolerance: 0   # Allow 0 error-level FAILs
  warning_tolerance: 2 # Allow up to 2 warning-level FAILs

# Exclude patterns
exclude:
  paths:
    - "node_modules/**"
    - "*.min.js"
    - ".git/**"
  checks:
    - "H-3"  # Disable URL check (internal network environment)

📖 3 Real-World Usage Examples

Example 1: Immediate Verification After Code Generation

Situation: coding-agent generated a Python script, and you want to verify it actually works

[Conversation]
User: Implement a file upload feature

Agent: [Code generation complete] Created src/uploader.py...

User: hallucination check on src/uploader.py

Agent: 🛡️ Running H-1...
  ✅ src/uploader.py exists
  
  🛡️ Running H-2...
  ✅ python3 confirmed installed
  ❌ boto3: not installed (import boto3 used in code)
  
  🛡️ Running H-4...
  ✅ Python syntax valid
  
  🛡️ Running H-9...
  ❌ TODO: S3 bucket name needs to be configured (line 23)
  
  🛡️ Running H-12...
  ✅ boto3: confirmed on PyPI (just not installed locally)
  
[Result] PASS 4/5 — Need to install boto3 and handle TODO
Recommend: pip install boto3, then retry

Example 2: Technical Documentation Quality Verification

Situation: Agent wrote team API documentation, needs pre-deployment verification

[Execution]
hallucination check on docs/api-reference.md --scope=fact,consistency

[Result]
H-3: ❌ https://api.example.com/v2/users → HTTP 404
     (Endpoint in documentation doesn't actually exist)
     
H-5: ❌ "Average response time 23ms" — no source
     ❌ "99.9% uptime guaranteed" — no source/evidence
     
H-8: ⚠️ Mixed usage of "user" / "client" / "end-user" detected
     → Unification recommended

[Fix Instructions]
1. Change /v2/users → /v1/users or create the endpoint
2. Add "[internal measurement, 2026-03]" to response time figure
3. Replace uptime claim with SLA document link
4. Unify to "user" throughout

Example 3: Full Scan Before Release (CI Pipeline)

Situation: Auto-run in GitHub Actions before PR merge

# .github/workflows/hallucination-guard.yml
name: Hallucination Guard

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  hg-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Hallucination Guard
        run: |
          # H-9, H-10 (completeness) checks
          echo "=== H-9: TODO/FIXME check ==="
          count=$(grep -rn "TODO\|FIXME\|HACK\|XXX" src/ | wc -l)
          [ "$count" -gt 0 ] && { echo "FAIL: $count found"; exit 1; } || echo "PASS"
          
          echo "=== H-10: Placeholder check ==="
          count=$(grep -rn "PLACEHOLDER\|YOUR_API_KEY\|CHANGEME\|TBD" src/ | wc -l)
          [ "$count" -gt 0 ] && { echo "FAIL: $count found"; exit 1; } || echo "PASS"
          
          echo "=== H-4: Syntax validity ==="
          find src -name "*.py" -exec python3 -m py_compile {} \; && echo "PASS"
          
          echo "✅ Hallucination Guard complete"
          
      - name: Comment PR
        if: failure()
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '❌ Hallucination Guard FAIL — Please fix TODO/placeholder/syntax errors.'
            })

🔧 Skill Execution Principles (For Agents)

Agents that have read this skill follow these principles:

  1. On receiving "hallucination check" trigger — Immediately start executing all 14 items
  2. When target is unclear — Automatically set the current working file/directory as target
  3. On finding FAIL items — Attempt to fix before reporting to user (state explicitly if fix is not possible)
  4. --quick flag — Run only H-1, H-2, H-9, H-10, H-12 (essential 5)
  5. When --scope is specifiedfact(H-15), consistency(H-68), completeness(H-911), hallucination(H-1214)
  6. Threshold — Default 85% PASS (customizable via .hallucination-guard.yaml)
  7. Uncertain items — Label as "needs verification:", no arbitrary judgment

📌 References and Sources

  • Python AST parsing: Python Official Docs - ast module
  • npm registry API: https://registry.npmjs.org/<package-name> (public API)
  • PyPI JSON API: https://pypi.org/pypi/<package-name>/json (public API)
  • GNU grep patterns: POSIX standard compliant
  • The above APIs are publicly accessible as of March 2026 (needs verification: subject to future changes)

hallucination-guard v1.0.0 — by reikys
"Proving trust through execution, not prompts"

Version tags

latestvk97b2j4md7c8xh2p78wz45ctvs83npqf