academic-talon

Use this skill when the user wants to search for academic papers, analyze PDF files, extract metadata, or save papers to Zotero.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 86 · 0 current installs · 0 all-time installs
byTongChaodong@bigdogaaa
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (search, analyze, Zotero archive) match the code and SKILL.md. The code implements search connectors (Semantic Scholar, arXiv, SerpAPI), PDF downloading/analysis via GROBID, and Zotero archiving; required binaries (python) and declared optional env vars align with these features.
Instruction Scope
Runtime instructions and code perform network operations (search APIs, download arbitrary PDFs, call GROBID, call Zotero API) and write files under the skill directory (.cache, pdfs). These behaviors are expected for the feature set but are privileges to be aware of: the skill will fetch URLs supplied by users (including internal/private endpoints reachable from the agent) and store PDF/XML outputs on disk.
Install Mechanism
No opaque download/install steps; SKILL.md instructs a pip install -r requirements.txt (requests, python-dotenv, pyzotero). No remote archives, shortened URLs, or arbitrary binary installs are present.
Credentials
The registry shows no required env vars; SKILL.md documents optional .env entries (ZOTERO API key/IDs, SEMANTIC_SCHOLAR API key, SERPAPI_KEY, TAVILY_API_KEY, GROBID_API_URL). These are proportional to the stated functionality. The skill reads a .env file (and environment) to obtain these optional keys — supply only the keys you trust to this skill directory.
Persistence & Privilege
always is false; the skill writes files only to its own directories (pdfs, .cache). It does not request system-wide configuration changes or alter other skills. It may be invoked autonomously by the agent (platform default), which increases blast radius only in combination with other issues (none found).
Assessment
This skill appears to do what it says, but review and accept these practical implications before installing: (1) it will install Python packages via pip; (2) it will make outbound network requests (search APIs, download PDFs, and contact a GROBID server and Zotero API) — do not provide it with credentials for services you don't want it to access; (3) it will download and store PDFs and XML under the skill directory (.cache and pdfs); (4) because it fetches arbitrary URLs provided to it, it can access internal network endpoints reachable from the agent environment (SSRF risk) — avoid analyzing URLs you don't trust or run the skill in a network-isolated environment; (5) provide Zotero API keys only if you intend the skill to archive items into your Zotero library. If you need higher assurance, review the full code (provided) or run the code in an isolated environment (container) and avoid placing sensitive credentials in the skill directory .env.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.5
Download zip
latestvk9755x7f3k0hxfh791sxdrr8hh8415c9

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🎓 Clawdis
Binspython

SKILL.md

Instructions

You are an academic research assistant.

Use this skill to:

  • Search for academic papers
  • Download and analyze PDF files
  • Extract structured metadata (BibTeX or full text)
  • Archive papers into Zotero

When to use

Trigger this skill if the user:

  • asks to find or search academic papers
  • provides a PDF and wants analysis or metadata extraction
  • wants to save or organize papers in Zotero
  • asks for BibTeX or citation generation

Actions

You MUST choose the correct action:

  • search → find papers
  • download → download PDF
  • analyze → extract metadata or full text
  • archive → save to Zotero

Rules

  • Always select the correct action based on user intent
  • Prefer search before download if no URL is provided
  • Use analyze to extract BibTeX before archiving
  • Avoid duplicate archiving
  • Return structured JSON results only

Overview (Human Readable Documentation)

This skill provides a comprehensive solution for academic paper research and management. It allows users to search for papers across multiple engines, analyze PDF files to extract metadata, and archive papers to Zotero for easy reference.

Features

  1. Multi-engine paper search
    • Semantic Scholar
    • arXiv
    • Google Scholar (via SerpAPI)
    • Tavily
  2. PDF analysis
    • Header analysis (returns BibTeX format)
    • Full text analysis (returns XML format)
    • Uses GROBID API for parsing
  3. Optional Zotero archiving
    • Archives papers to Zotero library (requires Zotero API credentials)
    • Adds PDF URL as link
    • Avoids duplicate entries
    • Adds items to specified collection (default: "openclaw")

Quick Reference

SituationAction
Search for papersUse search action with query parameter
Analyze PDF headerUse analyze action with pdf_path and analysis_type="header"
Analyze PDF full textUse analyze action with pdf_path and analysis_type="fulltext"
Archive paper to ZoteroUse archive action with paper_info parameter

OpenClaw Setup

Installation

Via ClawdHub (recommended):

clawdhub install academic-talon

Install Python dependencies:

pip install -r requirements.txt

Configuration

Create a .env file in the skill directory with the following variables:

# Zotero API credentials (optional, required only for archive functionality)
# ZOTERO_API_KEY=your_zotero_api_key
# ZOTERO_LIBRARY_ID=your_zotero_library_id
# ZOTERO_LIBRARY_TYPE=user # or group

# Optional API keys for additional search engines
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key # Optional
SERPAPI_KEY=your_serpapi_key # For Google Scholar
TAVILY_API_KEY=your_tavily_api_key # For Tavily

# GROBID API URL (default: http://localhost:8070/api)
GROBID_API_URL=http://localhost:8070/api

Required Services

  • GROBID server - You can start the server using one of the following methods:

    Official Quick Start:

    version: "3.9"
    
    services:
      grobid:
        image: grobid/grobid:0.8.2-crf
        container_name: grobid
        restart: unless-stopped
    
        # ❗ Do not expose port externally
        expose:
          - "8070"
    
        environment:
          JAVA_OPTS: "-Xms512m -Xmx4g"
          GROBID_MAX_CONCURRENCY: "1"
    
        volumes:
          - ./grobid/tmp:/opt/grobid/tmp
          - ./grobid/logs:/opt/grobid/logs
    
        healthcheck:
          test: ["CMD", "curl", "-f", "http://localhost:8070/api/isalive"]
          interval: 30s
          timeout: 5s
          retries: 5
    
        networks:
          - grobid-net
    
      nginx:
        image: nginx:latest
        container_name: grobid-nginx
        restart: unless-stopped
    
        ports:
          - "8080:8080"
    
        volumes:
          - ./nginx.conf:/etc/nginx/conf.d/default.conf
          # Uncomment the line below if you want to add username/password
          # - ./.htpasswd:/etc/nginx/.htpasswd
    
        depends_on:
          - grobid
    
        networks:
          - grobid-net
    
    networks:
      grobid-net:
    
    • Create an nginx.conf file with the following content:
    server {
        listen 8080;
    
        location / {
            proxy_pass http://grobid:8070;
    
            # =========================
            # ✅ IP Whitelist (Important)
            # =========================
            allow 127.0.0.1;        # Localhost
            # allow 10.0.0.0/8;     # Internal network (optional)
    
            deny all;
    
            # =========================
            # 🔐 Optional: Basic Auth
            # =========================
            # auth_basic "Restricted";
            # auth_basic_user_file /etc/nginx/.htpasswd;
    
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
    
    • Run docker-compose up -d to start the services Default Configuration:
    • GROBID API URL: http://localhost:8070/api
    • If using the provided Docker Compose setup, access GROBID at: http://localhost:8080/api

Usage

Search Papers

from skill import skill

# Search for papers on "hallucination"
result = skill.run({
    "action": "search",
    "query": "hallucination",
    "limit": 5,
    "source": "all"
})

print(result)

# Search papers with custom engine weights
    result = skill.run({
        "action": "search",
        "query": "hallucination in AI",
        "limit": 10,
        "source": "all",
        "engine_weights": {
            "arxiv": 5,
            "google_scholar": 3,
            "semantic_scholar": 1,
            "tavily": 1
        }
    })

print(result)

Download PDF

from skill import skill

# Download PDF from URL with custom filename
result = skill.run({
    "action": "download",
    "url": "https://example.com/paper.pdf",
    "filename": "example_paper.pdf"
})

print(result)

# Download PDF with custom save directory
result = skill.run({
    "action": "download",
    "url": "https://example.com/paper.pdf",
    "save_dir": "/path/to/pdf/library"
})

print(result)

# Download PDF with paper info for citation key generation
result = skill.run({
    "action": "download",
    "url": "https://example.com/paper.pdf",
    "paper_info": {
        "title": "Paper Title",
        "authors": ["John Doe", "Jane Smith"],
        "year": "2024"
    }
})

print(result)

Analyze PDF

from skill import skill

# Analyze PDF header from local path
result = skill.run({
    "action": "analyze",
    "pdf_input": "/path/to/paper.pdf",
    "analysis_type": "header"
})

print(result)

# Analyze PDF full text from URL
result = skill.run({
    "action": "analyze",
    "pdf_input": "https://example.com/paper.pdf",
    "analysis_type": "fulltext"
})

print(result)

Archive to Zotero

from skill import skill

# Archive paper to Zotero
result = skill.run({
    "action": "archive",
    "paper_info": {
        "title": "Paper Title",
        "authors": ["Author 1", "Author 2"],
        "year": "2023",
        "abstract": "Paper abstract",
        "url": "https://example.com/paper",
        "pdf_url": "https://example.com/paper.pdf",
        "bibtex": "@article{...}"
    }
})

print(result)

Input Schema

ParameterTypeDescriptionRequiredDefault
actionstringAction to perform ("search", "download", "analyze", "archive")Yes"search"
querystringSearch query (for search action)Yes (search)""
limitintegerNumber of results to return (for search action)No10
sourcestringSearch source ("all", "semantic_scholar", "arxiv", "google_scholar", "tavily")No"all"
engine_weightsobjectDictionary of engine weights (for search action)No{"arxiv": 5, "google_scholar": 3, "semantic_scholar": 1, "tavily": 1}
urlstringURL of the PDF file (for download action)Yes (download)""
filenamestringFilename to save the PDF as (for download action)NoNone
save_dirstringDirectory to save the PDF in (for download action)NoNone
paper_infoobjectPaper information (for download and archive actions)No{}
collectionstringName of the collection to add the paper to (for archive action)No"openclaw"
pdf_inputstringPath to local PDF file or URL to PDF (for analyze action)Yes (analyze)""
analysis_typestringType of analysis ("header", "fulltext")No"header"

Output Schema

Search Action

{
  "success": true,
  "action": "search",
  "query": "hallucination",
  "results": [
    {
      "title": "Paper Title",
      "authors": ["Author 1", "Author 2"],
      "year": "2023",
      "abstract": "Paper abstract",
      "url": "https://example.com/paper",
      "pdf_url": "https://example.com/paper.pdf",
      "source": "semantic_scholar"
    }
  ]
}

Download Action

{
  "success": true,
  "action": "download",
  "url": "https://example.com/paper.pdf",
  "pdf_path": "/path/to/downloaded/paper.pdf"
}

Analyze Action

{
  "success": true,
  "action": "analyze",
  "pdf_input": "/path/to/paper.pdf",
  "analysis_type": "header",
  "result": "@article{...}"
}

Archive Action

{
  "success": true,
  "action": "archive",
  "result": {
    "success": true,
    "item_id": "ABC123",
    "added_to_collection": true
  }
}

Error Handling

The skill returns error messages in the following format:

{
  "success": false,
  "error": "Error message"
}

Common errors include:

  • Missing required parameters
  • API key not configured
  • GROBID server not accessible
  • Zotero API errors

Dependencies

  • Python 3.6+
  • Required packages:
    • requests
    • python-dotenv
    • pyzotero
    • flask

Examples

Example 1: Search for papers

from skill import skill

# Search for papers on "artificial intelligence"
result = skill.run({
    "action": "search",
    "query": "artificial intelligence",
    "limit": 3,
    "source": "arxiv"
})

# Print results
if result["success"]:
    for i, paper in enumerate(result["results"]):
        print(f"{i+1}. {paper['title']}")
        print(f"Authors: {', '.join(paper['authors'])}")
        print(f"Year: {paper['year']}")
        print(f"URL: {paper['url']}")
        print(f"PDF URL: {paper['pdf_url']}")
        print()
else:
    print(f"Error: {result['error']}")

Example 2: Analyze PDF and archive to Zotero

from skill import skill
import os

# Path to PDF file
pdf_path = os.path.join(os.path.dirname(__file__), "papers", "example.pdf")

# Analyze PDF header
analyze_result = skill.run({
    "action": "analyze",
    "pdf_input": pdf_path,
    "analysis_type": "header"
})

if analyze_result["success"]:
    # Archive to Zotero
    paper_info = {
        "title": "Example Paper",
        "authors": ["John Doe", "Jane Smith"],
        "year": "2023",
        "abstract": "This is an example paper",
        "url": "https://example.com/paper",
        "pdf_url": "https://example.com/paper.pdf",
        "bibtex": analyze_result["result"]
    }
    
    archive_result = skill.run({
        "action": "archive",
        "paper_info": paper_info
    })
    
    if archive_result["success"]:
        print("Paper archived successfully!")
        print(f"Item ID: {archive_result['result']['item_id']}")
    else:
        print(f"Error archiving paper: {archive_result['error']}")
else:
    print(f"Error analyzing PDF: {analyze_result['error']}")

Troubleshooting

Common Issues

  1. GROBID server not accessible
    • Make sure GROBID server is running
    • Check the GROBID_API_URL in .env file
  2. Zotero API errors
    • Verify ZOTERO_API_KEY and ZOTERO_LIBRARY_ID in .env file
    • Check Zotero API rate limits
  3. Search engines returning empty results
    • For Google Scholar: Ensure SERPAPI_KEY is configured
    • For Tavily: Ensure TAVILY_API_KEY is configured
    • For Semantic Scholar: Consider adding SEMANTIC_SCHOLAR_API_KEY for higher rate limits
  4. PDF analysis failing
    • Ensure PDF file is accessible
    • Check GROBID server status

Logs

The skill logs errors to the console. For detailed debugging, check the console output.

Security Considerations

Data Privacy

  • PDF Processing: When analyzing PDF files, the skill sends PDF content to the configured GROBID API endpoint. For maximum privacy, run GROBID locally using the provided Docker Compose setup.
  • API Keys: Store API keys in the .env file and never commit them to version control.
  • File Storage: Downloaded PDF files are stored in the pdfs directory within the skill's installation directory.

Best Practices

  • Local GROBID: Use the provided Docker Compose setup to run GROBID locally, ensuring PDF content is not sent to external services.
  • Restricted Zotero Access: Create a dedicated Zotero API key with limited permissions for archiving.
  • Environment Variables: Use environment variables for API keys instead of hardcoding them.
  • Network Security: When using the Docker Compose setup, the GROBID service is not exposed externally, and the nginx proxy includes IP whitelist protection.

Risk Mitigation

  • The skill only processes PDF files from user-provided URLs or local files within the skill's pdfs directory.
  • All file operations are restricted to the skill's installation directory, preventing unauthorized file access.
  • The skill does not request elevated permissions or modify system files.
  • The Docker Compose setup includes health checks and security best practices.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request to paper reader github.

License

This project is licensed under the MIT License.

Files

8 total
Select a file
Select a file to preview.

Comments

Loading comments…