MinerU PDF Parser

ReviewAudited by ClawScan on May 10, 2026.

Overview

The skill mostly does what it claims by sending chosen documents to MinerU for conversion, but its downloaded ZIP extraction is unsafe enough to require review before use.

Install only if you are comfortable sending the selected documents to MinerU. Use a dedicated output folder, avoid highly sensitive files unless MinerU's terms are acceptable, protect the API token, and prefer a patched version that safely validates ZIP contents before extraction.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If MinerU or a returned ZIP URL were compromised or malformed, files could be written outside the selected output folder and potentially overwrite local files.

Why it was flagged

A ZIP downloaded from the provider is extracted without validating that archive entries remain inside the intended output directory; the same extraction pattern appears in the other included scripts.

Skill content
with zipfile.ZipFile(zip_path) as zf:
    zf.extractall(extract_dir)
Recommendation

Use only in a contained output directory until the scripts validate ZIP entry paths, reject absolute or '..' paths, and optionally limit extracted file sizes.

What this means

Anyone with the token may be able to use the linked MinerU API quota or account privileges.

Why it was flagged

The skill requires a MinerU API token, which is expected for the MinerU service integration but is still account access material.

Skill content
export MINERU_TOKEN="your-token-here"
Recommendation

Store the token securely, prefer the environment variable over command-line token arguments, and revoke/rotate it if exposed.

What this means

Document contents, including potentially sensitive PDFs, Word files, slides, or images, leave the local machine for third-party processing.

Why it was flagged

Selected local documents are read and uploaded to MinerU or its returned upload URL for parsing, which matches the stated purpose.

Skill content
API_BASE = "https://mineru.net/api/v4" ... requests.put(upload_url, data=file_data, timeout=300)
Recommendation

Only process documents you are allowed to upload to MinerU and review MinerU's privacy/data-retention terms for sensitive material.

What this means

Parsed content from untrusted documents could later be reused by notes, search, RAG, or agent workflows, including any prompt-like text inside the original document.

Why it was flagged

The skill supports writing parsed Markdown into persistent, possibly cloud-synced knowledge-base locations.

Skill content
Saving parsed content to Obsidian or knowledge bases ... --output "~/Library/Mobile Documents/com~apple~CloudDocs/Obsidian/VaultName/"
Recommendation

Treat converted Markdown from untrusted documents as untrusted content and review it before using it in shared knowledge bases or agent memory.

What this means

Future dependency versions could change behavior or introduce dependency risk, though these packages are expected for this API client.

Why it was flagged

The skill depends on common PyPI HTTP libraries with lower-bound version constraints rather than pinned, reproducible versions.

Skill content
requests>=2.28.0
aiohttp>=3.8.0
Recommendation

Install dependencies from a trusted Python environment and consider pinning exact versions or hashes for repeatable deployments.