pdf-parser-mineru
PassAudited by VirusTotal on May 12, 2026.
Overview
Type: OpenClaw Skill Name: pdf-parser-mineru Version: 1.0.2 The OpenClaw AgentSkills skill bundle for PDF parsing using MinerU is classified as benign. All files, including `SKILL.md`, `install.sh`, and `script/pdf_parser.py`, align with the stated purpose of converting PDFs to Markdown or JSON. The `install.sh` script uses standard Python package management tools (`pip`, `uv`) to install the `mineru` dependency without any suspicious remote execution or persistence mechanisms. Crucially, the `script/pdf_parser.py` uses `subprocess.run` with a list of arguments, which safely prevents shell injection vulnerabilities from user-controlled parameters like `file_path` and `output_dir`. There are no indications of prompt injection attempts, data exfiltration, unauthorized network activity, or other malicious behaviors.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Installing the skill's dependency may change the user's Python environment and pulls code from external package repositories.
The skill installs external Python packages without version pinning. This is expected for a local MinerU-based parser, but it means users depend on the current PyPI/uv package supply chain.
python3 -m pip install --upgrade pip ... pip3 install uv ... uv pip install -U "mineru[all]"
Install in a virtual environment or container, verify the MinerU package source, and consider pinning known-good versions.
The skill runs the installed MinerU binary on selected PDFs and can consume local CPU, memory, and disk resources.
The wrapper executes the local MinerU command. It does not use shell=True and this execution is central to the PDF parsing purpose, but it is still local command execution.
result = subprocess.run(cmd, capture_output=True, text=True, timeout=3600, env=env)
Use a trusted MinerU installation and run large or untrusted PDFs in a controlled environment if resource use is a concern.
Choosing an existing broad output folder could cause unexpected files in that folder tree to be read or included in the result.
The tool creates the caller-provided output directory and then recursively searches it for generated Markdown or JSON files. If a broad existing directory is chosen, unrelated files could be picked up.
os.makedirs(output_dir, exist_ok=True) ... output_files = list(Path(output_dir).rglob("*.md"))Use a fresh, dedicated output directory for each conversion and avoid pointing the tool at sensitive or broad directories.
If a PDF contains malicious prompt-like text, the agent could be influenced if it treats the parsed content as instructions rather than document data.
The skill returns parsed PDF content directly to the agent. That is expected, but untrusted PDFs can contain text that looks like instructions.
"markdown_content": md_content ... "json_content": json_content
Treat extracted PDF content as untrusted data, especially when processing documents from unknown sources.
