Install
openclaw skills install mineru-fast-extractMinerU fast extract — zero-setup, instant document extraction. Convert PDFs, images, Word (DOCX), and PowerPoint (PPTX) to Markdown with no login, no token, no API key, no configuration required. Just install and run. Powered by the MinerU flash-extract engine with built-in OCR, table recognition, and formula extraction (LaTeX). Handles scanned documents, photos of text, academic papers, contracts, invoices, resumes, and slides out of the box. Use this skill when you need to: quickly extract text from a PDF, convert a document to Markdown without signing up, read a scanned PDF, turn a Word file into Markdown, parse a PowerPoint presentation, OCR an image, extract content from a PDF file, or get a fast document conversion with no setup. Supports 80+ languages including Chinese, English, Japanese, Korean, Arabic, Hindi, French, German, Spanish, Russian, and many more. Works with local files and remote URLs. Ideal for developers, researchers, students, and anyone who wants instant document parsing without accounts or API tokens. Use as a Claude Code skill, agent tool, or standalone CLI. PDF提取、文档转Markdown、免登录PDF转换、快速文档提取、扫描件OCR、图片转文字、Word转Markdown、PPT转Markdown、PDF解析、零配置文档转换。无需注册、无需Token,安装即用,一键提取PDF、Word、PPT、图片中的文字内容。
openclaw skills install mineru-fast-extractZero-setup, instant document parsing — no login, no token, no configuration needed. Supports tables and formulas (LaTeX).
npm install -g mineru-open-api
Or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
mineru-open-api version
mineru-open-api flash-extract report.pdf # PDF → Markdown (instant!)
mineru-open-api flash-extract report.pdf -o ./out/ # Save to file
mineru-open-api flash-extract resume.docx # Word → Markdown
mineru-open-api flash-extract slides.pptx # PowerPoint → Markdown
mineru-open-api flash-extract photo.png # Image → Markdown (OCR)
mineru-open-api flash-extract https://example.com/doc.pdf # URL → Markdown
| Format | Supported |
|---|---|
PDF (.pdf) | Yes |
Images (.png, .jpg, .jpeg, .jp2, .webp, .gif, .bmp) | Yes |
Word (.docx) | Yes |
PowerPoint (.pptx) | Yes |
| URLs (remote files) | Yes |
mineru-open-api flash-extract <file-or-url> [flags]
| Flag | Short | Default | Description |
|---|---|---|---|
--output | -o | (stdout) | Output path (file or directory) |
--language | ch | Document language | |
--pages | (all) | Page range, e.g. 1-10 | |
--timeout | 900 | Timeout in seconds |
--language valuesValues are organized by script/language family — each value covers all languages in its group.
| Value | Included languages | 说明 |
|---|---|---|
ch | Chinese, English, Chinese Traditional | 中英文(默认值) |
ch_server | Chinese, English, Chinese Traditional, Japanese | 繁体、手写体 |
en | English | 纯英文 |
japan | Chinese, English, Chinese Traditional, Japanese | 日文为主 |
korean | Korean, English | 韩文 |
chinese_cht | Chinese, English, Chinese Traditional, Japanese | 繁体中文为主 |
ta | Tamil, English | 泰米尔文 |
te | Telugu, English | 泰卢固文 |
ka | Kannada | 卡纳达文 |
el | Greek, English | 希腊文 |
th | Thai, English | 泰文 |
| Value | Script/Family | Included languages |
|---|---|---|
latin | Latin script (拉丁语系) | French, German, Afrikaans, Italian, Spanish, Bosnian, Portuguese, Czech, Welsh, Danish, Estonian, Irish, Croatian, Uzbek, Hungarian, Serbian (Latin), Indonesian, Occitan, Icelandic, Lithuanian, Maori, Malay, Dutch, Norwegian, Polish, Slovak, Slovenian, Albanian, Swedish, Swahili, Tagalog, Turkish, Latin, Azerbaijani, Kurdish, Latvian, Maltese, Pali, Romanian, Vietnamese, Finnish, Basque, Galician, Luxembourgish, Romansh, Catalan, Quechua |
arabic | Arabic script (阿拉伯语系) | Arabic, Persian, Uyghur, Urdu, Pashto, Kurdish, Sindhi, Balochi, English |
cyrillic | Cyrillic script (西里尔语系) | Russian, Belarusian, Ukrainian, Serbian (Cyrillic), Bulgarian, Mongolian, Abkhazian, Adyghe, Kabardian, Avar, Dargin, Ingush, Chechen, Lak, Lezgin, Tabasaran, Kazakh, Kyrgyz, Tajik, Macedonian, Tatar, Chuvash, Bashkir, Malian, Moldovan, Udmurt, Komi, Ossetian, Buryat, Kalmyk, Tuvan, Sakha, Karakalpak, English |
east_slavic | East Slavic (东斯拉夫语系) | Russian, Belarusian, Ukrainian, English |
devanagari | Devanagari script (天城文语系) | Hindi, Marathi, Nepali, Bihari, Maithili, Angika, Bhojpuri, Magahi, Santali, Newari, Konkani, Sanskrit, Haryanvi, English |
mineru-open-api flash-extract report.pdf
mineru-open-api flash-extract report.pdf -o ./out/
mineru-open-api flash-extract report.pdf --language en
mineru-open-api flash-extract report.pdf --language latin
mineru-open-api flash-extract report.pdf --pages "1-5"
mineru-open-api flash-extract contract.docx -o ./out/
mineru-open-api flash-extract presentation.pptx -o ./out/
mineru-open-api flash-extract scan.jpg --language ch
-o flag: result goes to stdout; status/progress messages go to stderr-o flag: result saved to file/directory; progress messages on stderr.md file$...$ and block $$...$$)When using this skill on behalf of the user:
flash-extract for any input — whether it's a local file or a URL (e.g. https://cdn-mineru.openxlab.org.cn/demo/example.pdf). Do NOT assume a URL means "web page". flash-extract handles URLs to document files directly.mineru-open-api flash-extract "report 01.pdf".When the user does NOT specify -o, generate a default output directory:
~/MinerU-Skill/<name>_<hash>/
<name>: derived from the source, then sanitized (replace spaces and shell-unsafe characters with _, collapse consecutive _).
https://arxiv.org/pdf/2509.22186 → 2509.22186)report.pdf → report)<hash>: first 6 characters of MD5 hash of the full original source.echo -n "source" | md5sum | cut -c1-6 # Linux
echo -n "source" | md5 | cut -c1-6 # macOS
When the user specifies -o: use the user's path as-is.
When the user asks to upgrade this skill, re-install the CLI first:
npm install -g mineru-open-api@latest
| Code | Meaning | Recovery |
|---|---|---|
| 0 | Success | — |
| 1 | General API or unknown error | Check network; retry; use --verbose |
| 2 | Invalid parameters / usage error | Check command syntax and flag values |
| 4 | File too large or page limit exceeded | Try a smaller file or fewer pages |
| 5 | Extraction failed | Document may be corrupted or unsupported |
| 6 | Timeout | Increase with --timeout |
--timeout 1600--language to match the document language