Hfpclawer Citation Audit

Data & APIs

Verify academic paper citations using a three-tier fallback pipeline: local FTS5 database → Semantic Scholar API → OpenAlex API. Supports single citation checks and batch reference-list audits. No external API keys required for basic usage.

Install

openclaw skills install hfpclawer-citation-audit

hfpclawer Citation Audit

Verify whether a cited academic paper actually exists, using a three-tier pipeline that degrades gracefully when local data or remote APIs are unavailable.

Who this is for: Researchers, reviewers, and literature-survey authors who need to check whether a citation refers to a real paper.

Overview

The audit engine tries three sources in order, stopping at the first confirmation:

                    ┌──────────────────────────┐
 User:              │  hfpclawer audit verify   │
 "Is this paper     │  "Fourier Neural Operator"│
 real?"             └─────────────┬────────────┘
                                  │
                    ┌─────────────┼─────────────┐
                    ▼             ▼              ▼
              ┌─────────┐  ┌──────────┐  ┌──────────┐
              │ L1:     │  │ L2:      │  │ L3:      │
              │ Local   │→ │ Semantic │→ │ OpenAlex │
              │ FTS5 DB │  │ Scholar  │  │          │
              │ (1ms)   │  │ (200ms)  │  │ (200ms)  │
              └─────────┘  └──────────┘  └──────────┘

Each source independently reports one of four statuses:

  • VERIFIED — the paper exists in this source
  • SUSPECTED — possible match (similar title, but not exact)
  • NOT_FOUND — no match found
  • ERROR — source unavailable (no local DB / API rate-limited)

When to Use

  • A user cites a paper you cannot find — verify its existence
  • You team is writing a survey / literature review — batch audit the reference list
  • You downloaded an LLM-generated paper and need to fact-check its citations
  • You want to know whether a paper is a known arXiv preprint or a non-existent hallucination

Prerequisites

  • pip install hfpclawer>=0.5.0
  • No API keys needed for basic use (S2 + OpenAlex use anonymous tier)
  • Optional: Set S2_API_KEY env var for 10x faster Semantic Scholar lookups
  • Optional: Set OPENALEX_POLITE_EMAIL env var for 10x faster OpenAlex lookups
  • Optional: Clone arxiv-metadata-service repo for L1 local FTS5 (see references/local-db-setup.md)

Quick Start

1. Verify a Single Citation (most common)

# Auto mode: tries local DB first, then Semantic Scholar, then OpenAlex
hfpclawer audit verify "Fourier Neural Operator for Parametric Partial Differential Equations"

# Short title works too — includes substring fallback
hfpclawer audit verify "Fourier Neural Operator"

# Exact arXiv ID
hfpclawer audit verify --arxiv-id 2010.08895

2. Use a Specific Source

# Local FTS5 only (needs arxiv_meta.db)
hfpclawer audit verify "Attention Is All You Need" --source local

# Semantic Scholar only
hfpclawer audit verify "Attention Is All You Need" --source s2

# OpenAlex only
hfpclawer audit verify "Attention Is All You Need" --source openalex

3. Check a Reference List from File

# Save citations in a text file, one per paragraph
cat > refs.txt << 'EOF'
The FNO paper (arXiv:2010.08895) shows promising results.
PINNs were introduced by Raissi et al. (2019) "Physics-informed neural networks".
EOF

hfpclawer audit --refs refs.txt

Output Format

Each result shows:

  • [OK] VERIFIED — paper confirmed; includes title, authors, source
  • [?] SUSPECTED — possible but uncertain; shows top matches
  • [NF] NOT_FOUND — no evidence of this paper
  • [ERR] ERROR — source unavailable (DB not found, rate limited)
[OK] VERIFIED
  Title: Fourier Neural Operator
  Authors: Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, ...
  Sources: openalex: VERIFIED

How Statuses Are Determined

StatusLocal DBSemantic ScholarOpenAlex
VERIFIEDFTS5 match with title similarity >= 0.70Title search ≥ 0.70Title search ≥ 0.70
SUSPECTEDFTS5 match with score 0.40-0.69
NOT_FOUNDNo FTS5 resultsNo ≥0.70 matchNo ≥0.70 match
ERRORDB not found / corrupt429/5xx / network429/5xx / network

Title matching: Title similarity uses difflib.SequenceMatcher on normalized (lowercase, punctuation-stripped) titles. Short titles that are substrings of longer titles also pass the 0.70 threshold.

Batch Modes

From a Text File

hfpclawer audit --refs references.txt

The parser detects:

  • arXiv:XXXX.XXXXX identifiers
  • "Title" (Author, Year) patterns
  • Author (Year) "Title" patterns

Via Python API

from hfpclawer.citation_audit import check_citation

result = check_citation(
    "Fourier Neural Operator",
    authors_hint="Li",
    year_hint=2020,
    source="auto",       # or "local" / "s2" / "openalex"
)
print(result["status"])  # VERIFIED | NOT_FOUND | ERROR
print(result.get("authors", "N/A"))
print(result.get("per_source", {}))  # Per-source breakdown

Common Pitfalls

  1. Short/two-word queries may fail L1 because FTS5's porter stemmer requires actual content words. Use at least 3-4 significant words for local DB queries.
  2. Semantic Scholar rate-limits aggressively without API key (~1 req/s, ~100 req/day anonymous). Set S2_API_KEY for production use.
  3. OpenAlex polite pool is free and gives 10 req/s — set OPENALEX_POLITE_EMAIL to your institution email.
  4. No L1 without arxiv-metadata-service: The local FTS5 DB requires git clone of the separate arxiv-metadata-service repo. Without it, the auto chain starts at L2 (slower but still works).

Verification Checklist

  • Single citation works: hfpclawer audit verify "Known Paper Title"
  • arXiv ID works: hfpclawer audit verify --arxiv-id 2010.08895
  • Non-existent paper returns NOT_FOUND
  • Network errors return ERROR (not crash)
  • Batch mode processes multiple citations from file
  • hfpclawer audit verify --help shows source options