Phy Citation Checker

Other

Verify academic citations against CrossRef, Semantic Scholar, and OpenAlex. Detects AI-hallucinated references, chimeric citations, and suspicious patterns.

Install

openclaw skills install phy-citation-checker

check-citations

Verify academic citations against CrossRef, Semantic Scholar, and OpenAlex. Detects AI-hallucinated references, chimeric citations (real title + wrong authors), and suspicious patterns before submission.

When to Use

  • After writing or editing a .bib file with AI assistance
  • Before submitting a paper, thesis, or report
  • When reviewing AI-generated literature sections
  • As a CI/CD check in LaTeX manuscript pipelines
  • When auditing existing bibliographies for dead or fabricated references

Background

  • 6-55% of AI-generated citations are fabricated (varies by model/domain)
  • 100+ hallucinated references found in NeurIPS 2025 accepted papers
  • Universities increasingly treat fake citations as academic misconduct
  • Three hallucination types: fully fabricated, chimeric (real title + wrong authors), modified real (slightly altered metadata)

Usage

Quick Check (Single File)

python scripts/citation_checker.py references.bib

Check All .bib Files in a Directory

python scripts/citation_checker.py path/to/report/

JSON Output (CI/CD Pipelines)

python scripts/citation_checker.py references.bib --json

Verbose Mode (Debug API Responses)

python scripts/citation_checker.py references.bib --verbose

How It Works

Cascading Multi-Source Verification

Each citation is checked against three independent databases:

SourceCoverageStrength
CrossRef140M+ DOI-registered worksBest for journal/conference papers with DOIs
Semantic Scholar200M+ papersBest author disambiguation, arXiv coverage
OpenAlex240M+ worksBroadest coverage, fully open

Verification logic:

  • Found in 2+ sources with matching title → verified (high confidence)
  • Found in 1 source only → suspicious (manual check recommended)
  • Found in 0 sources → not_found (likely hallucinated)

Chimeric Detection

When a citation's title matches a real paper but the authors don't overlap at all, it's flagged as a possible chimeric hallucination — the most dangerous type because the title looks real on Google Scholar.

Red Flag Heuristics

  • Invalid DOI format (doesn't start with 10.xxxx/)
  • Suspiciously generic title patterns ("A Comprehensive Survey of...")
  • Future publication year
  • Missing author or year fields
  • Single-word author names (incomplete metadata)

Exit Codes

CodeMeaning
0All citations verified
1One or more citations not found
2Suspicious citations only (no hard failures)

Dependencies

pip install requests

No API keys required — uses free tiers of all three databases.

Accuracy (Tested)

CategoryResultDescription
Known-good9/10 (90%)Famous ML papers (Vaswani, Devlin, Brown, He, etc.)
Known-bad10/10 (100%)Fabricated papers with plausible titles
Chimeric5/5 (100%)Real titles with wrong authors
False positive rate10%1 miss: unpublished tech report without DOI
False negative rate0%No fake paper was ever verified

The core guarantee: fake papers are never marked as real.

Limitations

  • Papers without DOI that have many derivatives (e.g., BERT without DOI) may not be found via title search alone — always include DOIs when available
  • Semantic Scholar free tier rate-limits at ~100 requests/5 minutes — batch verification is slower
  • Cannot verify papers behind paywalls or not indexed in any of the three databases
  • Book chapters, technical reports, and grey literature have lower coverage

Integration with LaTeX Workflows

Pre-commit Hook

#!/bin/bash
# .git/hooks/pre-commit
python scripts/citation_checker.py references.bib --json > /tmp/cite_check.json
NOT_FOUND=$(python3 -c "import json; d=json.load(open('/tmp/cite_check.json')); print(d['summary']['not_found'])")
if [ "$NOT_FOUND" -gt "0" ]; then
    echo "BLOCKED: $NOT_FOUND unfound citations. Run 'python scripts/citation_checker.py references.bib --verbose' to investigate."
    exit 1
fi

GitHub Actions

- name: Check citations
  run: |
    pip install requests
    python scripts/citation_checker.py references.bib --json > citation_report.json
    python -c "
    import json, sys
    r = json.load(open('citation_report.json'))
    if r['summary']['not_found'] > 0:
        print(f'FAIL: {r[\"summary\"][\"not_found\"]} citations not found')
        sys.exit(1)
    print(f'PASS: {r[\"summary\"][\"verified\"]} verified, {r[\"summary\"][\"suspicious\"]} suspicious')
    "