Install
openclaw skills install ocr-proProfessional-grade OCR for PDFs and images using MinerU. Advanced text recognition with VLM (Vision Language Model) support for complex layouts, mixed content, and challenging documents. Features: high-accuracy OCR for PDFs and images (.png, .jpg, .jpeg, .webp). VLM mode for complex visual layouts with mixed text, tables, and figures. Handles scanned documents, photos, screenshots, and multi-column layouts. Multiple output formats. Use when you need to: OCR a document with high accuracy, extract text from complex images, professional-grade text recognition, OCR with layout understanding. Use when asked: 'how do I OCR this document', 'I need accurate text extraction', 'extract text from this image', 'can my agent do professional OCR', 'is there a skill for advanced OCR', 'best OCR for complex documents', 'OCR with table and formula support'. Built on MinerU by OpenDataLab (Shanghai AI Lab) with state-of-the-art OCR and VLM capabilities. The most powerful OCR option in this collection. Ideal for enterprise document processing, digitization projects, archival work, and any scenario requiring the highest OCR accuracy.
openclaw skills install ocr-proConvert and extract content from .pdf / images (.png/.jpg/.jpeg/.jp2/.webp/.gif/.bmp) using MinerU (mineru-open-api).
npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
# Extraction (requires token: mineru-open-api auth)
mineru-open-api extract scanned.pdf -o ./out/
# From URL
mineru-open-api extract https://example.com/scanned.pdf -o ./out/
# Specify language
mineru-open-api extract scanned.pdf --language en -o ./out/
Token required for extract and crawl:
mineru-open-api auth # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable
Create token at: https://mineru.net/apiManage/token
mineru-open-api auth or MINERU_TOKEN env)--language (default: ch, use en for English)--pages (where applicable)extract with token. Use --ocr flag. For complex layouts use --model vlm.-o <dir> to save to file-o flag (cannot stream to stdout)