Install
openclaw skills install ocr2markdownDocument OCR and parsing — converts PDF/images to Markdown on remote L4 GPU via Modal. Trigger when user says: OCR, PDF to markdown, parse PDF, extract text from PDF, 文档识别, PDF转Markdown, 扫描件识别. Takes local PDF/image files and returns Markdown with layout, tables, formulas, and OCR preserved.
openclaw skills install ocr2markdownMinerU document parsing pipeline on Modal L4 GPU — PDF/image → Markdown with layout, tables, formulas, and OCR preserved.
Pipeline code is bundled at ./src/ocr2markdown.py. After npx skills add, runs from any directory.
Slug = task identifier (volume directory name). Use user-provided value, or generate ocr_YYYYMMDD_HHMMSS if none given.
Directory input? Scan for PDF (.pdf), list with index, ask user to confirm selection.
Specific files? Use directly, no listing needed.
Ensure both volumes exist (idempotent):
modal volume create speech2srt-data 2>/dev/null || true
modal volume create speech2srt-models 2>/dev/null || true
Upload each file:
modal volume put speech2srt-data <local_file> <slug>/upload/
Modal put auto-creates remote directories.
modal run ./src/ocr2markdown.py --slug <slug>
Pipeline finds all .pdf files in <slug>/upload/ on the volume and processes them one by one.
Ctrl+C? Stop cleanly, report progress. Re-run with same slug — already-processed PDFs are skipped automatically.
Output directory structure:
<slug>/output/<stem>_ocr/
├── <stem>.md # Main markdown output
├── images/ # Extracted images referenced by markdown
└── *.pdf, *.json # Auxiliary outputs (layout, model data)
Download the entire output folder:
modal volume get speech2srt-data <slug>/output/ <local_destination>/
Preserve original directory tree.
Done. Parsed N PDF(s) in Xs
Results:
- <output_dir>/<stem>_ocr/<stem>.md
- <output_dir>/<stem>_ocr/images/
Before first run, verify:
python -Vmodal config show:
token_id null → modal setup to authenticatepip install modal then modal setup| PDF Size | Pages | Time/PDF |
|---|---|---|
| ~40-55 MB | varies | ~110-130s each |
Pipeline auto-skips PDFs with existing output.