PDF to DOCX

v0.4.0

Convert PDF documents to Word (.docx) format using MinerU. Transforms PDF files into editable Word documents preserving layout, text, tables, and formatting....

0· 64·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill name/description match the declared dependencies: it requires the mineru-open-api CLI and an MINERU_TOKEN, both of which are directly used by the SKILL.md commands.
Instruction Scope
SKILL.md only instructs the agent to run mineru-open-api commands, authenticate with MINERU_TOKEN, and read local PDF files or URLs. There are no instructions to access unrelated files, other env vars, or external endpoints beyond MinerU.
Install Mechanism
Install methods are standard: npm -g mineru-open-api or go install from the GitHub repo. These are expected for a CLI tool. (As always, installing third-party packages has inherent supply-chain risk — see user guidance.)
Credentials
Only one credential (MINERU_TOKEN) is required and it's used for authenticating to the MinerU service — proportional to the described functionality.
Persistence & Privilege
The skill is not always-enabled and does not request system-wide configuration changes or access to other skills' credentials. It behaves like a normal user-invokable CLI integration.
Assessment
This skill appears coherent, but consider these practical precautions before installing: 1) MINERU_TOKEN grants MinerU access to perform conversions — do not supply it if you don't trust MinerU or the token's scope. 2) Converted PDFs are uploaded to the service (implicit in using an external API); avoid sending sensitive/confidential documents unless you have reviewed MinerU's privacy/security policy. 3) Prefer installing from the official GitHub repo or a vetted npm package; inspect the mineru-open-api package source if you can. 4) If you only need occasional conversions, create a token with minimal scope and revoke it when finished. 5) The agent will read any local file you ask it to convert, so avoid giving it broad instructions that could cause it to scan filesystem locations you didn't intend to share.

Like a lobster shell, security has layers — review code before you run it.

latestvk973nk9jnrf5zeqjt764sjp9e9844y5c

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

📄 Clawdis
Binsmineru-open-api
EnvMINERU_TOKEN
Primary envMINERU_TOKEN

Install

Install via npm
Bins: mineru-open-api
npm i -g mineru-open-api
Install via go install
Bins: mineru-open-api

SKILL.md

PDF to DOCX

Convert PDF files to editable Word (.docx) format using MinerU.

⚠️ Token required. flash-extract does not support DOCX output. You must configure a token via mineru-open-api auth before using this skill.

⚠️ Output to file required. DOCX is a binary format and cannot be streamed to stdout — you must always specify -o <directory>.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Authentication

Token required — create one at https://mineru.net/apiManage/token:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Quick Start

# Convert PDF to DOCX (token required, -o is mandatory)
mineru-open-api extract report.pdf -f docx -o ./out/

# From URL
mineru-open-api extract https://example.com/report.pdf -f docx -o ./out/

# With language hint
mineru-open-api extract report.pdf -f docx --language en -o ./out/

# With VLM model for better layout accuracy (complex PDFs)
mineru-open-api extract report.pdf -f docx --model vlm -o ./out/

# Batch convert multiple PDFs
mineru-open-api extract *.pdf -f docx -o ./out/

Capabilities

  • Supported input: .pdf (local file or URL)
  • Output format: Word (.docx) via -f docx
  • Token required (mineru-open-api auth or MINERU_TOKEN env)
  • -o <dir> is mandatory — DOCX cannot stream to stdout
  • Language hint with --language (default: ch, use en for English)
  • Page range with --pages (e.g. 1-10)
  • Batch mode supported: extract *.pdf -f docx -o ./out/

Notes

  • flash-extract does NOT support DOCX output — always use extract with token
  • DOCX output cannot be streamed to stdout; -o flag is required
  • Use --model vlm for PDFs with complex layouts, tables, or mixed content
  • Use --model pipeline if you need guaranteed fidelity with no hallucination risk
  • Output directory will be created if it does not exist
  • All progress/status messages go to stderr
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…