pdf-parser-mineru

PassAudited by VirusTotal on May 12, 2026.

Overview

Type: OpenClaw Skill Name: pdf-parser-mineru Version: 1.0.2 The OpenClaw AgentSkills skill bundle for PDF parsing using MinerU is classified as benign. All files, including `SKILL.md`, `install.sh`, and `script/pdf_parser.py`, align with the stated purpose of converting PDFs to Markdown or JSON. The `install.sh` script uses standard Python package management tools (`pip`, `uv`) to install the `mineru` dependency without any suspicious remote execution or persistence mechanisms. Crucially, the `script/pdf_parser.py` uses `subprocess.run` with a list of arguments, which safely prevents shell injection vulnerabilities from user-controlled parameters like `file_path` and `output_dir`. There are no indications of prompt injection attempts, data exfiltration, unauthorized network activity, or other malicious behaviors.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Installing the skill's dependency may change the user's Python environment and pulls code from external package repositories.

Why it was flagged

The skill installs external Python packages without version pinning. This is expected for a local MinerU-based parser, but it means users depend on the current PyPI/uv package supply chain.

Skill content
python3 -m pip install --upgrade pip ... pip3 install uv ... uv pip install -U "mineru[all]"
Recommendation

Install in a virtual environment or container, verify the MinerU package source, and consider pinning known-good versions.

What this means

The skill runs the installed MinerU binary on selected PDFs and can consume local CPU, memory, and disk resources.

Why it was flagged

The wrapper executes the local MinerU command. It does not use shell=True and this execution is central to the PDF parsing purpose, but it is still local command execution.

Skill content
result = subprocess.run(cmd, capture_output=True, text=True, timeout=3600, env=env)
Recommendation

Use a trusted MinerU installation and run large or untrusted PDFs in a controlled environment if resource use is a concern.

What this means

Choosing an existing broad output folder could cause unexpected files in that folder tree to be read or included in the result.

Why it was flagged

The tool creates the caller-provided output directory and then recursively searches it for generated Markdown or JSON files. If a broad existing directory is chosen, unrelated files could be picked up.

Skill content
os.makedirs(output_dir, exist_ok=True) ... output_files = list(Path(output_dir).rglob("*.md"))
Recommendation

Use a fresh, dedicated output directory for each conversion and avoid pointing the tool at sensitive or broad directories.

What this means

If a PDF contains malicious prompt-like text, the agent could be influenced if it treats the parsed content as instructions rather than document data.

Why it was flagged

The skill returns parsed PDF content directly to the agent. That is expected, but untrusted PDFs can contain text that looks like instructions.

Skill content
"markdown_content": md_content ... "json_content": json_content
Recommendation

Treat extracted PDF content as untrusted data, especially when processing documents from unknown sources.