Image OCR

v0.4.0

OCR for photos and images using MinerU. Extract text from photographs, screenshots, camera captures, and image files with high accuracy. Features: image OCR...

0· 60·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (image OCR via MinerU) match the declared binary (mineru-open-api) and the commands documented in SKILL.md. Required binary and primary credential are appropriate for an OCR CLI wrapper.
Instruction Scope
SKILL.md only instructs the agent to run mineru-open-api commands, set or use MINERU_TOKEN for authenticated calls, and points to mineru.net/GitHub for tokens and source. It does not direct the agent to read unrelated files, exfiltrate data to unexpected endpoints, or access other environment variables.
Install Mechanism
Install options are standard: an npm package (mineru-open-api) and a Go 'go install' from an OpenDataLab GitHub repo. These are expected for a CLI. As with any third‑party package, installing from npm or go pulls code onto the host and should be verified (package page, repository, checksums/tags).
Credentials
Only MINERU_TOKEN is required and is the primary credential; SKILL.md documents that some commands (flash-extract) work without a token while extract requires it. Requesting a single service token is proportional to the skill's features.
Persistence & Privilege
always:false and normal autonomous invocation behavior. The skill does not request persistent system-wide privileges or modify other skills' configs in the instructions.
Assessment
This skill appears coherent, but follow standard precautions before installing: verify the npm package and the GitHub repo (publisher identity, recent commits, stars/issues) to reduce risk of typosquatting or malicious packages; only provide MINERU_TOKEN if you trust the service and give the token least privilege; if you prefer not to supply credentials, use 'flash-extract' (no token) for small quick OCR; review what the installed mineru-open-api binary does (source code) before running on sensitive images, and revoke the token if you observe unexpected behavior.

Like a lobster shell, security has layers — review code before you run it.

latestvk973d6wqfknr6y09jzv2b060n9845tav

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🖼️ Clawdis
Binsmineru-open-api
EnvMINERU_TOKEN
Primary envMINERU_TOKEN

Install

Install via npm
Bins: mineru-open-api
npm i -g mineru-open-api
Install via go install
Bins: mineru-open-api

SKILL.md

Image OCR

Extract text and content from images using MinerU. Supports photos, screenshots, scanned documents, and any image containing text.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Quick OCR from image (no token required)
mineru-open-api flash-extract photo.png

# Save to directory
mineru-open-api flash-extract screenshot.jpg -o ./out/

# From URL
mineru-open-api flash-extract https://example.com/image.png

# Specify language (default: ch)
mineru-open-api flash-extract photo.png --language en

# Precision OCR with token (better accuracy, no size limit)
mineru-open-api extract photo.png --ocr -o ./out/

# With VLM model for complex layouts or mixed content
mineru-open-api extract photo.png --ocr --model vlm -o ./out/

Authentication

No token needed for flash-extract. Token required for extract:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .png, .jpg, .jpeg, .jp2, .webp, .gif, .bmp (local file or URL)
  • flash-extract: quick OCR, no token, max 10 MB / 20 pages, Markdown output
  • extract: token required, higher accuracy with --ocr, supports --model vlm for complex images
  • Language hint with --language (default: ch, use en for English documents)
  • Formula recognition available via extract --formula
  • Table recognition available via extract --table

Notes

  • For scanned documents or low-quality images, use extract --ocr --model vlm for best results
  • flash-extract already applies OCR automatically on images — no extra flag needed
  • Output goes to stdout by default; use -o <dir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…