Mineru Pdf Parser

v3.0.3

使用 MinerU API 将 PDF 解析为 Markdown,支持公式、表格、OCR。提供本地文件和在线 URL 两种解析方式。触发条件:(1) 用户说"解析 PDF [路径]",(2) 用户说"将 PDF 转为 Markdown",(3) 在 paper-workflow 中自动调用。使用场景:学术论文解析...

1· 352·1 current·1 all-time
byZxYz@alex-zxyz
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name and description match the code and SKILL.md: the package uploads local files or URLs to MinerU (https://mineru.net) and polls for parsed results. Required env var (MINERU_TOKEN) is appropriate for authenticating to MinerU. No unrelated services, binaries, or credentials are requested.
Instruction Scope
SKILL.md explicitly instructs setting MINERU_TOKEN and installing requests, and warns that the token and files will be sent to https://mineru.net — this is transparent and consistent with the code. Minor inconsistency: SKILL.md states parsed results are saved to ~/.openclaw/MinerU_Results/, but the visible download() implementation defaults to a local 'result' directory (the file also defines BASE_DIR = ~/.openclaw/MinerU_Results which may be used elsewhere in truncated code). This is a small documentation/code mismatch to be aware of.
Install Mechanism
No automated install spec is present (instruction-only), aside from recommending 'pip install requests'. No downloads from unknown URLs and no archive extraction into the system are performed by the skill itself. This is low-risk for install-time code execution.
Credentials
Only one environment variable (MINERU_TOKEN) is required and documented. That token is necessary for the MinerU API and the skill clearly documents that it will be sent to mineru.net. No other secrets or unrelated env vars are requested.
Persistence & Privilege
always is false and the skill does not request to modify other skills or system-wide configuration. It writes results to local disk (expected for a parser) but does not request elevated privileges or permanent platform presence.
Assessment
This skill is coherent with its described purpose, but it will upload your PDF files and your MINERU_TOKEN to mineru.net. Only proceed if you trust MinerU's service and privacy policy. Before installing: (1) review the included mineru_api.py yourself (it is the active code that will run); (2) consider creating a scoped API token you can revoke; (3) verify where results are saved on disk (SKILL.md and code show slightly different defaults); and (4) avoid supplying highly sensitive documents unless you're comfortable they will be processed by an external service. If you need assurance, ask the publisher for a signed release or run the script in an isolated environment first.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📄 Clawdis
EnvMINERU_TOKEN
latestvk978yyg33rehe9mdhng24qsj7d832ncb
352downloads
1stars
5versions
Updated 1mo ago
v3.0.3
MIT-0

PDF Parser Skill

基于 MinerU 提供 PDF 解析能力。

功能

  • PDF 解析: 将 PDF 转换为 Markdown 格式
  • 公式识别: 支持 LaTeX 公式提取
  • 表格识别: 自动识别并转换表格结构
  • OCR: 支持图片型 PDF 文字识别
  • 多语言: 支持中文、英文,日文、韩文等

⚠️ 安装前必读

使用本技能即表示:

  1. 你愿意提供你的 MinerU API Token (MINERU_TOKEN)
  2. Token 会被发送给 https://mineru.net/
  3. 确认 MinerU 服务可信,接受其隐私政策
  4. 已在本地源码中确认无额外意外行为

前提条件

1. 安装依赖

pip install requests

2. 获取 MinerU Token

访问 https://mineru.net/ 注册并获取 API Token。

3. 设置环境变量

Windows (PowerShell):

$env:MINERU_TOKEN = "your-token-here"

macOS / Linux:

export MINERU_TOKEN=your-token-here

支持的引擎

引擎说明
vlmVLM 引擎(默认)
pipeline管道引擎
MinerU-HTMLHTML 输出

快速开始

# 解析 PDF (默认 vlm 引擎)
python scripts/mineru_api.py -f <pdf路径> --wait

# 指定引擎
python scripts/mineru_api.py -f <pdf路径> --engine pipeline --wait

选项

参数说明默认值
-f, --files本地 PDF 文件-
--engine解析引擎vlm
--lang语言 (ch/en/ja/ko)ch
--wait等待解析完成

环境变量

变量必填说明
MINERU_TOKENMinerU API Token

输出

解析结果保存在 ~/.openclaw/MinerU_Results/ 目录下。

工作流

  1. 设置 MINERU_TOKEN 环境变量
  2. 执行解析命令
  3. 等待解析完成
  4. 读取 full.md 分析内容
  5. 根据内容重命名目录

Comments

Loading comments...