Paper Parse

v1.0.0

Parse academic PDF papers into markdown with figure extraction.

0· 280· 1 versions· 0 current· 0 all-time· Updated 10h ago· MIT-0

Install

openclaw skills install paper-parse-figures

Paper Parse

Parse academic PDF papers into structured markdown with figure extraction using PyMuPDF.

Usage

uv run {baseDir}/scripts/parse_paper.py --pdf /path/to/paper.pdf [--output-dir ./output]

Output

The tool generates:

  • {paper_name}_content.md - Full paper content in markdown
  • {paper_name}_parsed.json - Structured metadata including:
    • Paper title
    • Number of pages
    • Extracted figures with captions and paths
  • cover_title_authors.png - First-page snapshot focused on title + authors region
  • figures/ - Directory containing high-resolution figure screenshots

Example

uv run scripts/parse_paper.py --pdf ~/papers/my-paper.pdf --output-dir ./parsed

Output structure:

./parsed/
├── my-paper_content.md
├── my-paper_parsed.json
└── figures/
    ├── figure_1.png
    ├── figure_2.png
    └── ...

Dependencies

  • PyMuPDF (fitz) - PDF parsing and rendering
  • pymupdf4llm - Markdown conversion

These are automatically managed by uv via the inline script metadata.

Version tags

latestvk97e47trmgfmpnndbqvp3p8g4x82n4f0

Runtime requirements

📄 Clawdis
Binsuv