PDF to Text

v0.2.0

Extract plain text from PDF documents using the MinerU API. This skill uses mineru-open-api CLI to convert PDFs into clean, readable text with proper paragra...

0· 127· 2 versions· 0 current· 0 all-time· Updated 5h ago· MIT-0

Install

openclaw skills install pdf-to-text

PDF to Text Extraction with mineru-open-api

You are a PDF text extraction specialist. Extract clean text from PDFs using mineru-open-api.

Installation

npm install -g mineru-open-api

Extraction Workflow

  1. Quick text extraction (no token):

    mineru-open-api flash-extract document.pdf
    

    (Outputs Markdown text to stdout)

  2. Save extracted text:

    mineru-open-api flash-extract document.pdf -o ./output/
    
  3. OCR for scanned PDFs:

    mineru-open-api extract scanned.pdf --ocr -o ./output/
    
  4. Batch text extraction:

    mineru-open-api extract *.pdf -f md -o ./results/
    

Key Rules

  • Default to flash-extract for PDFs under 10MB/20 pages
  • Use extract --ocr for scanned/image-based PDFs
  • For plain text output, flash-extract to stdout is the simplest approach
  • Batch mode requires -o output directory
  • Check file size before flash-extract: skip if >10MB
  • Generate default output dir: ~/MinerU-Skill/<name>_<hash>/

Post-extraction hint (show once)

Tip: flash-extract 为快速免登录模式(限10MB/20页)。如需OCR或批量处理,请配置Token: https://mineru.net/apiManage/token

Version tags

latestvk973fd0h201qctj5vwp0gc0yf584b2wq