Mineru Pdf Parser
v3.0.3使用 MinerU API 将 PDF 解析为 Markdown,支持公式、表格、OCR。提供本地文件和在线 URL 两种解析方式。触发条件:(1) 用户说"解析 PDF [路径]",(2) 用户说"将 PDF 转为 Markdown",(3) 在 paper-workflow 中自动调用。使用场景:学术论文解析...
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name and description match the code and SKILL.md: the package uploads local files or URLs to MinerU (https://mineru.net) and polls for parsed results. Required env var (MINERU_TOKEN) is appropriate for authenticating to MinerU. No unrelated services, binaries, or credentials are requested.
Instruction Scope
SKILL.md explicitly instructs setting MINERU_TOKEN and installing requests, and warns that the token and files will be sent to https://mineru.net — this is transparent and consistent with the code. Minor inconsistency: SKILL.md states parsed results are saved to ~/.openclaw/MinerU_Results/, but the visible download() implementation defaults to a local 'result' directory (the file also defines BASE_DIR = ~/.openclaw/MinerU_Results which may be used elsewhere in truncated code). This is a small documentation/code mismatch to be aware of.
Install Mechanism
No automated install spec is present (instruction-only), aside from recommending 'pip install requests'. No downloads from unknown URLs and no archive extraction into the system are performed by the skill itself. This is low-risk for install-time code execution.
Credentials
Only one environment variable (MINERU_TOKEN) is required and documented. That token is necessary for the MinerU API and the skill clearly documents that it will be sent to mineru.net. No other secrets or unrelated env vars are requested.
Persistence & Privilege
always is false and the skill does not request to modify other skills or system-wide configuration. It writes results to local disk (expected for a parser) but does not request elevated privileges or permanent platform presence.
Assessment
This skill is coherent with its described purpose, but it will upload your PDF files and your MINERU_TOKEN to mineru.net. Only proceed if you trust MinerU's service and privacy policy. Before installing: (1) review the included mineru_api.py yourself (it is the active code that will run); (2) consider creating a scoped API token you can revoke; (3) verify where results are saved on disk (SKILL.md and code show slightly different defaults); and (4) avoid supplying highly sensitive documents unless you're comfortable they will be processed by an external service. If you need assurance, ask the publisher for a signed release or run the script in an isolated environment first.Like a lobster shell, security has layers — review code before you run it.
Runtime requirements
📄 Clawdis
EnvMINERU_TOKEN
latest
PDF Parser Skill
基于 MinerU 提供 PDF 解析能力。
功能
- PDF 解析: 将 PDF 转换为 Markdown 格式
- 公式识别: 支持 LaTeX 公式提取
- 表格识别: 自动识别并转换表格结构
- OCR: 支持图片型 PDF 文字识别
- 多语言: 支持中文、英文,日文、韩文等
⚠️ 安装前必读
使用本技能即表示:
- 你愿意提供你的 MinerU API Token (
MINERU_TOKEN) - Token 会被发送给 https://mineru.net/
- 确认 MinerU 服务可信,接受其隐私政策
- 已在本地源码中确认无额外意外行为
前提条件
1. 安装依赖
pip install requests
2. 获取 MinerU Token
访问 https://mineru.net/ 注册并获取 API Token。
3. 设置环境变量
Windows (PowerShell):
$env:MINERU_TOKEN = "your-token-here"
macOS / Linux:
export MINERU_TOKEN=your-token-here
支持的引擎
| 引擎 | 说明 |
|---|---|
| vlm | VLM 引擎(默认) |
| pipeline | 管道引擎 |
| MinerU-HTML | HTML 输出 |
快速开始
# 解析 PDF (默认 vlm 引擎)
python scripts/mineru_api.py -f <pdf路径> --wait
# 指定引擎
python scripts/mineru_api.py -f <pdf路径> --engine pipeline --wait
选项
| 参数 | 说明 | 默认值 |
|---|---|---|
| -f, --files | 本地 PDF 文件 | - |
| --engine | 解析引擎 | vlm |
| --lang | 语言 (ch/en/ja/ko) | ch |
| --wait | 等待解析完成 | 否 |
环境变量
| 变量 | 必填 | 说明 |
|---|---|---|
| MINERU_TOKEN | 是 | MinerU API Token |
输出
解析结果保存在 ~/.openclaw/MinerU_Results/ 目录下。
工作流
- 设置
MINERU_TOKEN环境变量 - 执行解析命令
- 等待解析完成
- 读取 full.md 分析内容
- 根据内容重命名目录
Comments
Loading comments...
