MinerU PDF Parser
ReviewAudited by ClawScan on May 10, 2026.
Overview
The skill mostly does what it claims by sending chosen documents to MinerU for conversion, but its downloaded ZIP extraction is unsafe enough to require review before use.
Install only if you are comfortable sending the selected documents to MinerU. Use a dedicated output folder, avoid highly sensitive files unless MinerU's terms are acceptable, protect the API token, and prefer a patched version that safely validates ZIP contents before extraction.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If MinerU or a returned ZIP URL were compromised or malformed, files could be written outside the selected output folder and potentially overwrite local files.
A ZIP downloaded from the provider is extracted without validating that archive entries remain inside the intended output directory; the same extraction pattern appears in the other included scripts.
with zipfile.ZipFile(zip_path) as zf:
zf.extractall(extract_dir)Use only in a contained output directory until the scripts validate ZIP entry paths, reject absolute or '..' paths, and optionally limit extracted file sizes.
Anyone with the token may be able to use the linked MinerU API quota or account privileges.
The skill requires a MinerU API token, which is expected for the MinerU service integration but is still account access material.
export MINERU_TOKEN="your-token-here"
Store the token securely, prefer the environment variable over command-line token arguments, and revoke/rotate it if exposed.
Document contents, including potentially sensitive PDFs, Word files, slides, or images, leave the local machine for third-party processing.
Selected local documents are read and uploaded to MinerU or its returned upload URL for parsing, which matches the stated purpose.
API_BASE = "https://mineru.net/api/v4" ... requests.put(upload_url, data=file_data, timeout=300)
Only process documents you are allowed to upload to MinerU and review MinerU's privacy/data-retention terms for sensitive material.
Parsed content from untrusted documents could later be reused by notes, search, RAG, or agent workflows, including any prompt-like text inside the original document.
The skill supports writing parsed Markdown into persistent, possibly cloud-synced knowledge-base locations.
Saving parsed content to Obsidian or knowledge bases ... --output "~/Library/Mobile Documents/com~apple~CloudDocs/Obsidian/VaultName/"
Treat converted Markdown from untrusted documents as untrusted content and review it before using it in shared knowledge bases or agent memory.
Future dependency versions could change behavior or introduce dependency risk, though these packages are expected for this API client.
The skill depends on common PyPI HTTP libraries with lower-bound version constraints rather than pinned, reproducible versions.
requests>=2.28.0 aiohttp>=3.8.0
Install dependencies from a trusted Python environment and consider pinning exact versions or hashes for repeatable deployments.
