Install
openclaw skills install hfpclawer-paper-searchDiscover, download, and organize academic papers from arXiv, HuggingFace Papers, and OpenReview. Multi-source search → dedup → PDF download → Markdown conversion → optional wiki sync. Designed for researchers who want to monitor new papers daily.
openclaw skills install hfpclawer-paper-searchA multi-source academic paper pipe: search across arXiv / HuggingFace Papers / OpenReview / PapersWithCode, deduplicate by title, download PDFs, convert to Markdown, and optionally sync to a wiki.
Who this is for: Researchers who want a daily "new papers on my topic" feed without manually checking multiple websites.
Typical workflow in one command:
hfpclawer search # Discover new papers across sources
└── ranked by relevance to your keywords
hfpclawer download # Download PDFs for matched papers
└── 8 concurrent streams
hfpclawer convert --to-wiki # PDF → readable Markdown + wiki sync
Or run the full pipeline at once:
hfpclawer full --max-pages 3 --to-wiki
pip install hfpclawer>=0.5.0
hfpclawer init # Creates config.yaml in current directory
Edit config.yaml with your search interests (see Configuration section below).
# Create default config
hfpclawer init
# Edit the config to match your research interests
vim config.yaml
# → Change: search.queries, keywords.include_high, keywords.exclude
# Discover → Download → Convert → Wiki sync in one command
hfpclawer full
# Limit pages for a quick check
hfpclawer full --max-pages 3 --to-wiki
# Step 1: Search across all sources
hfpclawer search --max-pages 5
# Step 2: Download PDFs for matched papers
hfpclawer download
# Step 3: Convert PDFs to Markdown
hfpclawer convert
# Step 4: Sync to wiki directory
hfpclawer convert --to-wiki
# Check what papers have been downloaded
hfpclawer list
# Show paper store statistics
hfpclawer store stats
# Start the real-time download monitor
hfpclawer monitor start
The config file config.yaml controls what papers are searched and downloaded:
search:
max_per_dim: 50 # Papers per search query per source
queries:
- query: "neural operator"
category: neural-operator
- query: "physics-informed"
category: physics-informed
- query: "PDE solver deep learning"
category: pde-solver
keywords:
include_high: # Papers must match these (OR)
- "neural operator"
- "pde"
- "deep learning"
include_low: # Optional bonus keywords
- "fourier"
- "self-attention"
exclude: # Exclude these topics
- "quantum"
- "llm"
classification:
threshold_pass: 30 # Relevance score threshold (0-100)
title_similarity_min: 0.40 # Dedup threshold
paths:
data_dir: "data" # SQLite DB location
pdf_dir: "pdfs" # Downloaded PDFs
md_dir: "mds" # Converted Markdown files
| Command | Purpose | Common Flags |
|---|---|---|
hfpclawer search | Discover new papers | --max-pages, --dry-run |
hfpclawer download | Download PDFs | (runs from search results) |
hfpclawer convert | Convert PDF → MD | --to-wiki syncs to raw/papers/ |
hfpclawer full | All-in-one pipeline | --max-pages, --to-wiki |
hfpclawer list | List downloaded papers | |
hfpclawer store stats | Paper store statistics | |
hfpclawer store export | Export store as JSON/CSV | --format json |
hfpclawer store verify | Cross-verify paper metadata | --arxiv-id |
hfpclawer config | Show current config | |
hfpclawer mcp | Start MCP server | (for LLM integration) |
hfpclawer monitor | Download daemon control | start, stop, status |
hfpclawer dedup | Show dedup statistics |
# Quick scan (3 pages per query, ~50 papers)
hfpclawer search --max-pages 3
# View results
hfpclawer store stats
# Download all new papers
hfpclawer download
# Convert to readable markdown
hfpclawer convert
# Read the best one
cat mds/2010.08895.md | head -80
# Full sweep with wiki sync
hfpclawer full --max-pages 10 --to-wiki
# Validate references in newly added papers
hfpclawer audit verify "Key cited paper" --source openalex
hfpclawer uses three tiers:
| Storage | Location | Content | Persistence |
|---|---|---|---|
| SQLite | data/papers.db | Metadata, dedup, cross-ref | Persistent |
| PDFs | pdfs/ | Raw paper PDFs | Download once, keep |
| Markdown | mds/ | Converted text | Regeneratable from PDFs |
The paper store tracks:
pip install needs to be in the right venv. If hfpclawer command is not
found, check the active Python environment.max_per_dim to 10 if this happens.scrapy extra installed. If you see ModuleNotFoundError: scrapy, run pip install hfpclawer[scrapy].pymupdf4llm. Run pip install hfpclawer[pdf] if
hfpclawer convert complains about missing pymupdf4llm.raw/papers/. If you do not have a wiki directory,
skip --to-wiki and read from mds/ directly.config.yaml. Edit it before running hfpclawer full,
otherwise the default queries may not match your research area.hfpclawer init creates a valid config.yamlhfpclawer search --dry-run validates config without network callshfpclawer search --max-pages 3 returns real papershfpclawer download downloads PDFs correctlyhfpclawer convert produces readable Markdownhfpclawer store stats shows non-zero countshfpclawer store verify --arxiv-id 2010.08895 cross-checks via Crossref