Install
openclaw skills install academic-talon🎓 Full-stack academic research assistant - Search papers → Extract publication-ready BibTeX (header) → Full TEI XML document structure parsing (via GROBID) → Archive to Zotero → Serve local PDFs. Fixed arXiv AND search semantics, generates conference/journal-standard BibTeX, auto-creates Zotero collections, enables deep document understanding via GROBID structured parsing.
openclaw skills install academic-talonYour AI-powered academic research assistant for paper search → BibTeX extraction → Zotero archiving → local PDF serving.
Save hours of manual work searching papers, copying citations, and organizing your library.
Trigger this skill when the user wants to:
| Task | Description |
|---|---|
| 🔍 Search papers | Find papers across multiple academic search engines (arXiv, Google Scholar, Semantic Scholar, Tavily) |
| 📝 Extract BibTeX (header analysis) | Parse PDF header and output publication-ready BibTeX matching AI conference/journal standards |
| 📄 Full text analysis | Extract full document structure in TEI XML format for further processing |
| 🗄️ Archive to Zotero | Automatically save papers to your Zotero library, default to openclaw collection, auto-create collections |
| 📂 Local PDF library | Maintain a local PDF collection and serve it via HTTP for direct access from Zotero |
This is a toolbox skill that provides multiple independent academic research tools. You can use just the features you need. A common complete workflow looks like this:
User Query
↓
[academic-talon] ← this skill
↓
1. Search → Multiple search APIs (arXiv, Google Scholar via SerpAPI, etc.)
↓
2. PDF Download → saved to local `pdfs/` directory
↓
3. PDF Parsing → **GROBID service** processes PDF
↓
- Header analysis → extracts metadata → skill generates clean BibTeX
- Full text analysis → returns complete TEI XML with full document structure
↓
4. If header analysis: BibTeX Generation → skill formats clean publication-ready output
↓
5. Zotero Archiving → via **pyzotero** → your Zotero library → auto-add to collection
↓
6. PDF Serving → built-in HTTP server serves PDFs from your intranet
↓
Result: Paper in Zotero with working PDF link, clean BibTeX ready for citation
You don't have to use this full workflow - use individual tools as needed.
| Service | Purpose | Why do you need it? | Required? |
|---|---|---|---|
| GROBID | PDF metadata extraction | Parses PDF headers to extract title, authors, publication info for BibTeX | ✅ Required |
| Zotero API | Paper archiving | Stores papers in your Zotero library with correct metadata | ✅ Required for archiving |
| SerpAPI Key | Google Scholar search | enables searching Google Scholar | ⚙️ Optional (enables more results) |
| Semantic Scholar API Key | Semantic Scholar search | enables Semantic Scholar results | ⚙️ Optional |
| Tavily API Key | Tavily search | enables Tavily results | ⚙️ Optional |
pip install -r skills/academic-talon/requirements.txt
skills/academic-talon/.env)# ========== Zotero Configuration (Required for archiving) ==========
ZOTERO_API_KEY=your_zotero_api_key_here
ZOTERO_LIBRARY_ID=your_library_id_here
ZOTERO_LIBRARY_TYPE=user # or "group" for group libraries
# ========== GROBID Configuration (Required for PDF parsing) ==========
GROBID_API_URL=http://localhost:8070/api
# Or if you use Docker Compose behind nginx:
# GROBID_API_URL=http://localhost:8080/api
# ========== Optional Search API Keys ==========
# Get these from their respective websites
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key
SERPAPI_KEY=your_serpapi_key_for_google_scholar
TAVILY_API_KEY=your_tavily_api_key
# ========== Local PDF Serving (Optional) ==========
# After starting the PDF server, set this to your intranet URL:
# Example: PDF_BASE_URL=http://192.168.1.100:8000/
PDF_BASE_URL=http://your-server-ip:port/
| Environment Variable | What it does |
|---|---|
ZOTERO_API_KEY | Your Zotero API key from Zotero settings |
ZOTERO_LIBRARY_ID | Your Zotero library ID (found in Zotero API URL) |
ZOTERO_LIBRARY_TYPE | "user" for your personal library, "group" for group libraries |
GROBID_API_URL | URL of your GROBID service endpoint |
PDF_BASE_URL | Base URL for your locally running PDF server (e.g. http://10.26.20.168:18001/) |
Option A: Docker Compose (Recommended)
Create compose.yml in your GROBID directory:
version: "3.9"
services:
grobid:
# Choose the right image for your hardware:
# - For non-GPU environments: grobid/grobid:0.8.2-crf (CRF-only model, smaller)
# - For GPU environments: grobid/grobid:0.8.2-full (includes CRF + deep learning models)
image: grobid/grobid:0.8.2-crf
container_name: grobid
restart: unless-stopped
expose:
- "8070"
environment:
JAVA_OPTS: "-Xms512m -Xmx4g"
volumes:
- ./grobid/tmp:/opt/grobid/tmp
- ./grobid/logs:/opt/grobid/logs
💡 Image selection: Use
grobid/grobid:0.8.2-crffor CPU-only / non-GPU environments (smaller image, faster startup). Usegrobid/grobid:0.8.2-fullif you have GPU and want maximum accuracy with deep learning models.
Start:
docker-compose up -d
Option B: Direct run
Follow GROBID documentation to run directly.
If you want to serve downloaded PDFs locally:
# Start on port 8000, allow all intranet access
python skills/academic-talon/scripts/start_pdf_server.py start 8000 内网
# Check status
python skills/academic-talon/scripts/start_pdf_server.py status
# Stop
python skills/academic-talon/scripts/start_pdf_server.py stop
The server:
pdfs/ directory (sandboxed, no access outside)zhang2025hallucinationdetection.pdf)PDF_BASE_URL is configured, archived papers automatically get the correct local URL| Parameter | Type | Description | Required | Default |
|---|---|---|---|---|
action | string | Action to perform: search, download, analyze, archive | Yes | search |
query | string | Search keywords | Yes (search) | - |
limit | integer | Max results to return | No | 10 |
source | string | Search source: all, arxiv, google_scholar, semantic_scholar, tavily | No | all |
engine_weights | object | How many results from each engine | No | {"arxiv": 5, "google_scholar": 3, "semantic_scholar": 1, "tavily": 1} |
url | string | PDF URL to download | Yes (download) | - |
filename | string | Custom filename for downloaded PDF | No | auto from citation key |
paper_info | object | Paper metadata (title, authors, year) for citation key generation | No | - |
pdf_input | string | Path to local PDF or URL to remote PDF | Yes (analyze) | - |
analysis_type | string | header → outputs publication-ready BibTeX; fulltext → outputs TEI XML of full document | No | header |
collection | string | Zotero collection name to add paper to | No | openclaw |
All actions return JSON in this format:
{
"success": true,
"action": "search",
"query": "your search query",
"results": [
{
"title": "Paper Title",
"authors": ["Author One", "Author Two"],
"year": "2025",
"abstract": "Paper abstract...",
"url": "https://...",
"pdf_url": "https://...",
"source": "arxiv"
}
]
}
@article@inproceedings with conference name in booktitle@article with journal = {arXiv preprint xxxx.xxxxx} exactly matching your exampledate, month, publisher, day that shouldn't be in final submissionslastnameYearTitle → zhang2025hallucinationdetection matches standard academic practiceExample output (ready to paste into your manuscript):
@article{zhang2025hallucinationdetection,
author = {Zhang, Chenggong and Wang, Haopeng},
title = {Hallucination Detection and Evaluation of Large Language Model},
year = {2025},
journal = {arXiv preprint 2512.22416},
abstract = {Hallucinations in Large Language Models...},
}
@inproceedings{gal2016dropout,
author = {Gal, Yarin and Ghahramani, Zoubin},
title = {Dropout as a bayesian approximation: Representing model uncertainty in deep learning},
booktitle = {ICML},
year = {2016},
}
openclaw unless you specify otherwisepreprint, conference → conferencePaper, journal → journalArticleBenefit: Build your research library without repetitive manual clicking.
PDF Processing goes to GROBID:
GROBID_API_URL for metadata extractionLocal PDF Server:
pdfs/ directoryFile Access Restrictions:
pdfs/ directory within this skill's installationAPI Key Storage:
.env file.env to version control# 1. Search for papers
result = skill.run({
"action": "search",
"query": "LLM judge knowledge possession",
"limit": 5
})
# 2. Download PDF for first result
paper = result["results"][0]
download_result = skill.run({
"action": "download",
"url": paper["pdf_url"],
"paper_info": paper
})
# 3. Extract BibTeX from downloaded PDF
analyze_result = skill.run({
"action": "analyze",
"pdf_input": download_result["pdf_path"],
"analysis_type": "header"
})
# 4. Archive to Zotero (goes to openclaw collection by default)
paper["bibtex"] = analyze_result["result"]
archive_result = skill.run({
"action": "archive",
"paper_info": paper
})
if archive_result["success"]:
print(f"✅ Paper archived to Zotero: {archive_result['result']['item_id']}")
| Problem | Solution |
|---|---|
GROBID server not accessible | Check GROBID is running, verify GROBID_API_URL in .env |
Zotero API error | Check ZOTERO_API_KEY and ZOTERO_LIBRARY_ID are correct |
arXiv search returns nothing | Check network connectivity, arXiv API sometimes blocks unusual IPs |
PDF analysis returns empty | Check PDF isn't corrupted, verify GROBID is working |
Local PDF link doesn't work | Check PDF server is running, verify PDF_BASE_URL matches server address |
Duplicate papers in Zotero | Skill detects duplicates by title/DOI and adds to collection, safe to ignore |
requests, python-dotenv, pyzoteroMIT License - free for academic and commercial use.