PDF to DOCX
v0.4.0Convert PDF documents to Word (.docx) format using MinerU. Transforms PDF files into editable Word documents preserving layout, text, tables, and formatting....
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
The skill name/description match the declared dependencies: it requires the mineru-open-api CLI and an MINERU_TOKEN, both of which are directly used by the SKILL.md commands.
Instruction Scope
SKILL.md only instructs the agent to run mineru-open-api commands, authenticate with MINERU_TOKEN, and read local PDF files or URLs. There are no instructions to access unrelated files, other env vars, or external endpoints beyond MinerU.
Install Mechanism
Install methods are standard: npm -g mineru-open-api or go install from the GitHub repo. These are expected for a CLI tool. (As always, installing third-party packages has inherent supply-chain risk — see user guidance.)
Credentials
Only one credential (MINERU_TOKEN) is required and it's used for authenticating to the MinerU service — proportional to the described functionality.
Persistence & Privilege
The skill is not always-enabled and does not request system-wide configuration changes or access to other skills' credentials. It behaves like a normal user-invokable CLI integration.
Assessment
This skill appears coherent, but consider these practical precautions before installing: 1) MINERU_TOKEN grants MinerU access to perform conversions — do not supply it if you don't trust MinerU or the token's scope. 2) Converted PDFs are uploaded to the service (implicit in using an external API); avoid sending sensitive/confidential documents unless you have reviewed MinerU's privacy/security policy. 3) Prefer installing from the official GitHub repo or a vetted npm package; inspect the mineru-open-api package source if you can. 4) If you only need occasional conversions, create a token with minimal scope and revoke it when finished. 5) The agent will read any local file you ask it to convert, so avoid giving it broad instructions that could cause it to scan filesystem locations you didn't intend to share.Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
📄 Clawdis
Binsmineru-open-api
EnvMINERU_TOKEN
Primary envMINERU_TOKEN
Install
Install via npm
Bins: mineru-open-api
npm i -g mineru-open-apiInstall via go install
Bins: mineru-open-api
SKILL.md
PDF to DOCX
Convert PDF files to editable Word (.docx) format using MinerU.
⚠️ Token required.
flash-extractdoes not support DOCX output. You must configure a token viamineru-open-api authbefore using this skill.⚠️ Output to file required. DOCX is a binary format and cannot be streamed to stdout — you must always specify
-o <directory>.
Install
npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
Authentication
Token required — create one at https://mineru.net/apiManage/token:
mineru-open-api auth # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable
Quick Start
# Convert PDF to DOCX (token required, -o is mandatory)
mineru-open-api extract report.pdf -f docx -o ./out/
# From URL
mineru-open-api extract https://example.com/report.pdf -f docx -o ./out/
# With language hint
mineru-open-api extract report.pdf -f docx --language en -o ./out/
# With VLM model for better layout accuracy (complex PDFs)
mineru-open-api extract report.pdf -f docx --model vlm -o ./out/
# Batch convert multiple PDFs
mineru-open-api extract *.pdf -f docx -o ./out/
Capabilities
- Supported input: .pdf (local file or URL)
- Output format: Word (.docx) via
-f docx - Token required (
mineru-open-api authorMINERU_TOKENenv) -o <dir>is mandatory — DOCX cannot stream to stdout- Language hint with
--language(default:ch, useenfor English) - Page range with
--pages(e.g.1-10) - Batch mode supported:
extract *.pdf -f docx -o ./out/
Notes
flash-extractdoes NOT support DOCX output — always useextractwith token- DOCX output cannot be streamed to stdout;
-oflag is required - Use
--model vlmfor PDFs with complex layouts, tables, or mixed content - Use
--model pipelineif you need guaranteed fidelity with no hallucination risk - Output directory will be created if it does not exist
- All progress/status messages go to stderr
- MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
