MinerU Document Explorer

REQUIRED for any task involving reading or understanding PDF contents. Use when a user asks about a .pdf file — reading pages, answering questions, extractin...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 61 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name/description promise a PDF reading/explorer utility and the repository contains a complete 'doc-search' CLI, FastAPI server, OCR/embedding/reranker/backends, client, and utilities that match that purpose. Optional integrations (MinerU OCR, PageIndex, embedder, reranker, AgenticOCR) are expected for a full-featured document explorer.
Instruction Scope
SKILL.md and code instruct the agent to run a bundled setup script, optionally launch a local FastAPI server, download PDFs from arbitrary URLs, and write user-provided API keys into the bundled config.yaml. It also mandates appending short notes to references/tips.md after tasks (i.e., modifying files in the skill bundle). These behaviors are coherent with a local CLI/server tool, but they involve filesystem writes and optional network calls that you should be aware of.
!
Install Mechanism
There's no formal registry 'install' spec but a provided scripts/setup.sh installs the bundled package via pip -e with the flag --break-system-packages. That will install many third-party dependencies (fastapi, uvicorn, openai, mineru-open-sdk, etc.) from requirements.txt into the Python environment; using --break-system-packages and editable installs can affect the system Python environment. The install is local-source based (no remote arbitrary URL downloads), which is expected, but you should run it in an isolated venv/container.
Credentials
Registry metadata declared no required env vars, but SKILL.md lists several optional credentials (DOC_SEARCH_PAGEINDEX_API_KEY, DOC_SEARCH_PAGEINDEX_BASE_URL, DOC_SEARCH_EMBEDDING_BASE_URL, DOC_SEARCH_RERANKER_BASE_URL, MINERU_TOKEN) and the code reads DOC_SEARCH_* env vars and MINERU_TOKEN. These are relevant to the optional networked features. The mismatch between 'no required env vars' in registry metadata and the SKILL.md's optional credentials is worth noting but not necessarily malicious.
Persistence & Privilege
The skill is not always-enabled and does not request platform-wide privileges. It will write state/config files into its own SKILL_DIR (config-state.json, config.yaml copy, caches) and can run a local HTTP server by design (default host is 127.0.0.1). Those are normal for a local CLI/server tool, but if you run the server you should ensure it is bound to localhost and protected by server_api_key before exposing beyond your machine.
Assessment
What to consider before installing/running this skill: - Functionally this looks like a legitimate PDF explorer CLI + optional local server. It will install a Python package and many dependencies from the bundled source; run the included scripts/setup.sh only inside an isolated Python virtual environment or container to avoid touching system Python packages (the script uses --break-system-packages which can alter system site-packages). - The skill can run a FastAPI server (default host 127.0.0.1) and accepts uploads and networked OCR/embedding requests. Before running the server, set a non-empty server_api_key in the config or provide a strong API key via env var, and keep the server bound to localhost unless you intentionally want external access. - Optional API keys (MinerU / PageIndex / embedder / reranker / extractor) enable cloud features and must be provided by you. Only supply keys you trust the code to use; keys will be written into the skill's config.yaml if you configure them via the setup prompt. If you do not need cloud OCR or external models, skip entering those keys to keep processing local. - The skill will download PDFs when given http/https URLs and store files under its cache root inside the skill directory. Treat that like any code that downloads files: avoid pointing it at untrusted URLs, and inspect downloaded PDFs before reusing them. - Review the bundled requirements.txt and code (server.py, pdf_utils.py, config.py) if you have concerns about network calls, data retention, or where cached files are stored. If you want minimal risk, run the tool in a disposable VM or container, with no external API keys provided. If you'd like, I can point out the exact lines that: (a) read MINERU_TOKEN and other env vars, (b) perform downloads of remote PDFs, and (c) start the FastAPI server, or produce a short checklist for safely running setup.sh.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1
Download zip
latestvk973h3230nwdcp4jmytwr0kv9583zad8

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

MinerU Document Explorer

PDF reading toolkit via doc-search CLI. Search first, then read relevant pages — never scan an entire PDF.

⚠️ Network capabilities: This skill can optionally call external APIs (PageIndex outline generation, MinerU cloud OCR, embedding/reranker services) and run a local FastAPI server. All network features are opt-in and disabled by default.

Path conventions

SKILL_DIR  = <this file's parent directory>
SCRIPTS    = SKILL_DIR/scripts

Setup check

Read SKILL_DIR/config-state.json. If missing or setup_complete is not true:

  1. Read references/setup.md and run the installer
  2. After setup, ask the user if they want to configure PageIndex (e.g. "If you have an OpenAI-compatible API key, you can enable PageIndex to auto-generate a document outline — useful for scanned docs or manuals. Want to set it up?")
  3. If the user provides pageindex_api_key / pageindex_base_url → write to SCRIPTS/doc-search/config.yaml; if skipped → continue immediately, do not block

⚠️ MUST read reference docs before acting — no guessing

Any uncertainty about parameters, return fields, or query phrasing → MUST read the corresponding cmd file before running any command. Do not infer or guess.

  • references/cmd-init.md / cmd-outline.md / cmd-pages.md
  • references/cmd-search-keyword.md / cmd-search-semantic.md / cmd-elements.md

For complex tasks, errors, unexpected results, or unfamiliar scenarios → MUST read references/tips.md first. It contains proven workflows and hard-won pitfalls that will save you from repeating mistakes.


Command cheatsheet

All output is JSON on stdout. --timeout is a global flag before the subcommand; default is 120s.

doc-search init --doc_path "<path_or_url>"
doc-search outline --doc_id "<id>" [--max_depth N] [--root_node "<node_id>"]
doc-search pages --doc_id "<id>" --page_idxs "<p>" [--no_image] [--return_text]
doc-search search-keyword --doc_id "<id>" --page_idxs "<p>" --pattern "<regex>" [--return_text]
doc-search search-semantic --doc_id "<id>" --page_idxs "<p>" --query "<q>" [--top_k N] [--no_image] [--return_text]
doc-search --timeout 300 elements --doc_id "<id>" --page_idxs "<p>" --query "<q>"

Key reminders

  • Use outline and keyword search to narrow the reading range — never scan the full document
  • --page_idxs is 0-indexed — do not confuse with printed page numbers
  • After extracting figures/tables with elements, you must read crop_path to verify ; and the query should be "the actual chart image, not the caption text" ; if the query fails, check page_idxs or rephrase the query

Lessons learned (mandatory)

After completing any PDF task: pitfalls / new workflows / parameter discoveries → append to references/tips.md, 1-2 lines each, conclusions only.

Files

43 total
Select a file
Select a file to preview.

Comments

Loading comments…