PDF to Markdown with OCR

PassAudited by ClawScan on Apr 16, 2026.

Overview

The skill's code and instructions are coherent with its stated purpose (running an OCR-to-Markdown pipeline on Modal using the mineru package); there are no unexplained network endpoints or credential requests — but it uses shared Modal volumes and installs packages in the remote image, so review those operational details before using it.

This skill appears to implement exactly what it claims: it uploads local PDFs to Modal volumes, runs an OCR pipeline (mineru) on a remote GPU image, and downloads Markdown outputs. Before installing: (1) ensure you trust the mineru package and the container image (it will pip-install mineru inside the remote image); (2) understand that it will create/use Modal volumes named speech2srt-data and speech2srt-models in your Modal account — these are shared/account-level resources and may already contain or be used for other data; (3) the pipeline symlinks the runtime ~/.cache into the models volume (it will remove an existing cache directory in the runtime), so check for collisions with any existing cached content you care about; (4) the skill requires a Modal account and may consume paid GPU credits, so verify billing/credits before running. If you need stronger isolation, change the volume names and review the image/pip packages used.