OpenPaperGraph — Literature Discovery & Citation Analysis
You are a research assistant with access to a CLI tool for academic literature discovery and analysis.
Setup
The CLI is located at: SKILL_DIR/openpapergraph_cli.py
Before first use, ensure dependencies are installed:
pip install httpx pymupdf scholarly
All commands output JSON to stdout. Run from the SKILL_DIR directory.
Architecture: Multi-Source
This tool reduces dependency on any single data source:
| Task | Primary Sources | Fallback |
|---|
| Search | arXiv + DBLP + S2 | Deduplicated, sorted by citations |
| References | Download PDF → parse reference list | S2 API |
| Citations | Google Scholar | S2 API |
| Citation counts | Google Scholar | S2 |
| Recommendations | S2 Recommendations API | — |
| Reference resolution | arXiv → S2 → CrossRef → OpenAlex | Multi-cascade |
Available Commands
1. Search Papers
Multi-source search across arXiv, DBLP, and Semantic Scholar. Supports conference filtering.
python SKILL_DIR/openpapergraph_cli.py search "QUERY" --source SOURCE --venue VENUE --limit N
--source: all (default, multi-source), arxiv, dblp, or s2
--venue: Filter by conference — ICLR, NeurIPS, ICML, ACL, EMNLP, NAACL, WebConf, KDD
--limit: Max results (default 20)
When to use: User asks to find papers, search for literature, or look up specific topics/conferences.
2. Build Citation Network
Construct a citation graph from seed papers. References come from PDF parsing (downloaded from arXiv/Unpaywall), citations from Google Scholar. Falls back to S2 when needed.
python SKILL_DIR/openpapergraph_cli.py graph PAPER_ID1 PAPER_ID2 --depth 1 --output graph.json
- Paper IDs can be: S2 hex ID (
204e3073...), arXiv ID (ARXIV:1706.03762), DOI (DOI:10.1145/...), paper title ("attention is all you need"), PDF path (paper.pdf), BibTeX file (refs.bib), or Zotero CSL-JSON export (zotero.json)
--depth: Expansion depth (1 or 2, default 1)
--output: Save graph to file for later analysis/export
When to use: User wants to explore the citation landscape around specific papers.
3. Paper Recommendations
Get related paper recommendations based on one or more papers (via S2 Recommendations API).
python SKILL_DIR/openpapergraph_cli.py recommend PAPER_ID1 PAPER_ID2 --limit 10
- Also accepts paper titles and PDF paths as input
When to use: User wants to discover related or similar papers they may have missed.
4. Monitor New Papers
Check for recently published papers on a research topic (multi-source: arXiv + DBLP + S2, citation counts enriched via Google Scholar).
python SKILL_DIR/openpapergraph_cli.py monitor "TOPIC" --year-from 2025 --limit 20
When to use: User wants to stay updated on latest publications in a field.
5. Topic Analysis
Analyze a citation graph for topics, keyword distribution, year trends, and top authors.
python SKILL_DIR/openpapergraph_cli.py analyze graph.json
When to use: User wants to understand the thematic structure of a set of papers.
6. Research Summary
Generate a research summary from a citation graph. Uses LLM if any provider is configured, otherwise falls back to extractive analysis.
python SKILL_DIR/openpapergraph_cli.py summary graph.json --style STYLE
python SKILL_DIR/openpapergraph_cli.py summary graph.json --provider deepseek --model deepseek-chat
--style: overview (default), trends, or gaps
--provider: LLM provider name (e.g. openai, deepseek, qwen, zhipu, moonshot)
--model: Override the provider's default model
When to use: User wants a quick overview of a research area or to identify trends/gaps.
7. PDF Reference Extraction
Extract references from a PDF paper, resolving via multi-source cascade (arXiv → S2 → CrossRef → OpenAlex).
python SKILL_DIR/openpapergraph_cli.py pdf /path/to/paper.pdf
python SKILL_DIR/openpapergraph_cli.py pdf /path/to/paper.pdf --use-grobid
--use-grobid: Use GROBID for structured extraction (requires Docker service on port 8070)
- Returns: resolved papers, unresolved references, and resolve rate
When to use: User provides a PDF and wants to find/analyze its references.
7b. Build Graph from PDF Reference Lists
Build a citation graph directly from one or more PDF papers' reference lists.
python SKILL_DIR/openpapergraph_cli.py graph-from-pdf paper.pdf [paper2.pdf ...] --output graph.json
python SKILL_DIR/openpapergraph_cli.py graph-from-pdf paper.pdf --depth 1 --include-unresolved -o graph.json
--depth 0 (default): Only PDF references. --depth 1: Also expand resolved papers.
--include-unresolved: Keep unresolved references as nodes in the graph (marked resolved=false)
--use-grobid: Use GROBID for structured extraction
- References resolved via: arXiv → Semantic Scholar → CrossRef → OpenAlex (multi-source cascade)
When to use: User has PDF papers and wants a citation graph faithful to the actual reference lists.
8. Zotero Import
Import papers from a Zotero library or collection.
python SKILL_DIR/openpapergraph_cli.py zotero --user-id ID --api-key KEY [--collection KEY] [--list-collections]
When to use: User wants to import their existing Zotero library for analysis.
9. Export
Export a citation graph as BibTeX, CSV, Markdown, or JSON. All formats sort papers by year descending.
python SKILL_DIR/openpapergraph_cli.py export graph.json --format bibtex --output refs.bib
python SKILL_DIR/openpapergraph_cli.py export graph.json --format csv --output papers.csv
python SKILL_DIR/openpapergraph_cli.py export graph.json --format markdown --output papers.md
python SKILL_DIR/openpapergraph_cli.py export graph.json --format json --output papers.json
--format: bibtex (default), csv, markdown, or json
- CSV/Markdown/JSON include full fields: id, title, authors, year, citations, source, url, doi, arxiv_id, abstract
When to use: User wants to save results for use in a reference manager, spreadsheet, or documentation.
9b. Export Interactive HTML Graph
Export a citation graph as a self-contained interactive HTML visualization.
python SKILL_DIR/openpapergraph_cli.py export-html graph.json --output graph.html
python SKILL_DIR/openpapergraph_cli.py export-html graph.json --output graph.html --title "My Research" --summary --inline
--title: Custom page title (default: "Paper Graph")
--summary: Pre-generate AI summary at export time (requires LLM API key in env). Result is embedded; API key is NOT.
--inline: Inline vis-network JS for fully offline use (~500KB larger, no CDN needed)
--provider / --model: Override LLM provider/model for --summary
- Layout: Semantic left-to-right hierarchy — References (LEFT) → Seeds (CENTER) → Citations (RIGHT)
- Node types: Seeds (purple stars), References (blue circles), Citations (green diamonds), with legend
- Features: bidirectional hover linking, type filter, search/filter, in-page export, seed source management (add/remove seeds)
- Summary modes: (A) Pre-generate with
--summary, (B) Runtime API key (20+ providers), (C) Manual copy/paste (CORS-proof)
- Security: API keys are never embedded in the HTML output
When to use: User wants a visual, interactive exploration of the citation network, or wants to share a browsable graph.
9b. Interactive Graph Server (serve)
Start a local HTTP server for interactive graph management. Unlike export-html (static, read-only), serve lets users add papers, convert nodes to seeds, remove seeds, and all changes persist to the graph JSON file.
python SKILL_DIR/openpapergraph_cli.py serve graph.json --port 8787
--port: Server port (default: 8787)
--title: Custom page title
- Add papers: "+ Add Paper" button in toolbar. Input via title/ID, BibTeX, or PDF upload. Toggle "Treat as Seed Paper" to control expansion.
- Seed: Full expansion — fetches references + citations from S2/Google Scholar, adds nodes + edges
- Non-seed: Lightweight — only checks relationships with existing seeds, no expansion
- Convert to seed: Click any non-seed paper in the list → "⬆ Convert to Seed" button appears. Also available in the node tooltip when clicking graph nodes.
- Remove seed: Seeds/Sources tab → "Remove" button. Deletes seed + exclusive connections.
- Persistent: All changes immediately written to graph JSON file. Survives page refresh.
- Dedup: Papers matched by DOI > arXiv ID > title+year similarity (no duplicates)
When to use: User wants to interactively build and manage a citation network through the browser, with all changes persisted. Use export-html instead when you want a static file for sharing.
10. Remove Seed Paper
Remove a seed paper and all papers exclusively connected to it from a graph.
python SKILL_DIR/openpapergraph_cli.py remove-seed graph.json "paper_id_or_title"
- Accepts paper ID or title substring (fuzzy match)
- Removes the seed + papers connected only to that seed (not shared with other seeds)
- Cleans up all incident edges
- Overwrites the graph file (use
-o to save to a different file)
11. Remove Non-Seed Paper
Remove a single non-seed paper from a graph.
python SKILL_DIR/openpapergraph_cli.py remove-paper graph.json "paper_id_or_title"
- Accepts paper ID or title substring (fuzzy match)
- Only works for non-seed papers (use
remove-seed for seeds)
- Cleans up all incident edges
- Overwrites the graph file (use
-o to save to a different file)
12. List Conferences
Show supported conference venues for filtering.
python SKILL_DIR/openpapergraph_cli.py conferences
13. List LLM Providers
Show all 20 supported LLM providers and whether their API key is configured.
python SKILL_DIR/openpapergraph_cli.py llm-providers
Workflow Guidelines
- Start with search — Help the user find relevant seed papers first (default: multi-source)
- Build a graph — Use seed paper IDs to construct a citation network, save to a
.json file
- Explore interactively — Use
serve to open the graph in browser, add papers, convert to seeds (serve)
- Analyze — Run topic analysis or generate a summary on the saved graph
- Discover more — Use recommendations to find papers the user may have missed
- Export — Save results as BibTeX/CSV/Markdown/JSON for the user's reference manager
- Share — Generate a static HTML graph for sharing/viewing (
export-html)
Output Format
All commands output JSON to stdout. When presenting results to the user:
- Show paper titles, authors, year, and citation counts in a readable format
- For large result sets, summarize the top results and mention the total count
- Paper IDs can be: S2 hex IDs, arXiv IDs (
ARXIV:xxxx), DOIs (DOI:xxxx), paper titles, or PDF file paths
- The
source field in results indicates where each paper came from (arxiv, semantic_scholar, google_scholar, crossref, openalex, dblp)
Environment Variables
S2_API_KEY (Recommended)
Semantic Scholar API key. Free at semanticscholar.org/product/api.
- Purpose: Authenticates requests to the S2 API (paper search, citation data, recommendations)
- Why needed: Without it, S2 enforces strict rate limiting — frequent calls return 429 errors
- Role: S2 is the fallback in the multi-source architecture — when PDF download or Google Scholar fails, the system falls back to S2. Also the exclusive source for the
recommend command
LLM Provider API Key (Optional — any one of 20 providers)
The summary command supports 20 LLM providers. Set any one API key to enable LLM-powered summaries:
US: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, DEEPSEEK_API_KEY, GROQ_API_KEY, TOGETHER_API_KEY, FIREWORKS_API_KEY, MISTRAL_API_KEY, XAI_API_KEY, PERPLEXITY_API_KEY, OPENROUTER_API_KEY
Chinese: ZHIPUAI_API_KEY (智谱), MOONSHOT_API_KEY (月之暗面), BAICHUAN_API_KEY (百川), YI_API_KEY (零一万物), DASHSCOPE_API_KEY (通义千问), ARK_API_KEY (豆包), MINIMAX_API_KEY, STEPFUN_API_KEY (阶跃星辰), SENSENOVA_API_KEY (商汤)
Custom: Set LLM_API_KEY + LLM_BASE_URL + LLM_MODEL for any OpenAI-compatible endpoint.
Additional environment variables:
LLM_PROVIDER: Explicitly select LLM provider (alternative to --provider CLI flag)
LLM_MODEL: Override default model for the selected provider (alternative to --model CLI flag)
TMPDIR: Custom directory for PDF download cache (defaults to system temp)
Without any LLM key, summary uses extractive analysis and export-html hides the AI summary panel. All other commands are unaffected. Run llm-providers to check status.
Cross-Tool Compatibility
This CLI is designed to be called by any AI coding tool (Claude Code, OpenClaw, Codex, etc.):
- All output is structured JSON on stdout
- Errors go to stderr
- Exit code 0 = success, 1 = argument error, 2 = runtime error
- No interactive input required — all parameters via command-line flags