MinerU OCR Local & API

PassAudited by ClawScan on May 1, 2026.

Overview

This appears to be a coherent MinerU OCR wrapper, but API mode can upload documents to MinerU and local mode runs a MinerU command on your machine.

Before installing or using this skill, decide whether each document should use hosted API mode or local mode. Use local mode for confidential files, keep MinerU tokens and API base URLs trusted, install the local MinerU CLI from a trusted source, and clean up saved OCR artifacts that contain sensitive text.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

When API mode is used, the skill can submit OCR jobs to MinerU using your configured token.

Why it was flagged

Hosted API mode uses a MinerU token as a bearer credential, including a MINERU_ACCESS_TOKEN fallback. This is expected for the integration but gives the skill token-backed MinerU API authority.

Skill content
token = _get_env("MINERU_API_TOKEN", "MINERU_ACCESS_TOKEN") ... "Authorization": f"Bearer {config.token}"
Recommendation

Use a token intended only for MinerU, keep it out of shared logs or profiles, and unset it if you want to force local-only behavior.

What this means

Confidential PDFs or images may be sent to the hosted MinerU service if API mode is selected.

Why it was flagged

The hosted workflow can transmit local document bytes to the MinerU API. This is disclosed and purpose-aligned, but it means document contents leave the local machine.

Skill content
Hosted local-file flow starts with `POST /api/v4/file-urls/batch`, uploads bytes to `data.file_urls[]`, and polls `GET /api/v4/extract-results/batch/{batch_id}`
Recommendation

Use `--mode local` for documents that should not leave your device, and only set `MINERU_API_BASE_URL` to a trusted endpoint.

What this means

A configured local MinerU executable will run on your machine when local mode is used.

Why it was flagged

Local mode runs an external MinerU runtime/CLI. This is central to the stated local OCR purpose, but users should ensure the executable they configure is trusted.

Skill content
Local open-source flow invokes the official `mineru` CLI from `https://github.com/opendatalab/MinerU`
Recommendation

Install MinerU from a trusted source and avoid pointing `MINERU_LOCAL_CMD` or `--local-cmd` at untrusted executables.

What this means

OCR outputs may remain in temp or output folders after the task, including any sensitive text or instructions contained in the document.

Why it was flagged

The saved envelope can include complete extracted Markdown text, so sensitive document contents may persist on disk and later be reused as context.

Skill content
`mineru_caller.py` returns a stable JSON envelope around MinerU execution and, by default, saves that envelope to a unique file under the system temp directory.
Recommendation

Treat extracted document text as data rather than commands, choose output locations carefully, and delete artifacts when they are no longer needed.