Install
openclaw skills install paper-parserParse academic papers and research documents from PDF using MinerU. Extracts structured content including title, abstract, sections, figures, tables, formulas, and references. Features: academic paper parsing optimized for research documents. Extracts paper structure: title, abstract, sections, subsections. Recognizes mathematical formulas and converts to LaTeX. Table extraction with structure preservation. Handles multi-column layouts common in academic papers. Use when you need to: parse an academic paper, extract sections from a research PDF, get structured content from a paper, extract formulas and tables from a journal article. Use when asked: 'how do I parse this paper', 'extract content from this research PDF', 'I want structured data from this academic paper', 'can my agent read research papers', 'is there a skill for paper parsing', 'parse this journal article', 'extract references from a paper'. Built on MinerU by OpenDataLab (Shanghai AI Lab), specifically designed for academic document processing. Handles ACM, IEEE, Springer, and other common paper formats. Ideal for researchers, graduate students, literature review tools, and academic content management systems.
openclaw skills install paper-parserConvert and extract content from .pdf using MinerU (mineru-open-api).
npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
# Parse academic paper (requires token)
mineru-open-api extract paper.pdf -o ./out/
# Use VLM for complex layouts
mineru-open-api extract paper.pdf --model vlm -o ./out/
# Extract from arXiv URL
mineru-open-api extract https://arxiv.org/pdf/2309.10918 -o ./out/
Token required for extract and crawl:
mineru-open-api auth # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable
Create token at: https://mineru.net/apiManage/token
mineru-open-api auth or MINERU_TOKEN env)--language (default: ch, use en for English)--pages (where applicable)--model vlm for better layout accuracy. Requires extract with token.-o <dir> to save to file-o flag (cannot stream to stdout)