Doc Extract
v0.4.0Extract text and content from Word documents (.doc, .docx) to Markdown using MinerU. A straightforward tool for reading and extracting Word file content. Fea...
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description match the declared requirements: the skill needs the mineru-open-api CLI and an optional MINERU_TOKEN for full extraction of .doc files, which is coherent with a document-extraction utility.
Instruction Scope
SKILL.md instructs the agent to invoke mineru-open-api commands on local files or URLs and to set MINERU_TOKEN for authenticated operations; it does not request unrelated files, credentials, or system access.
Install Mechanism
Install options are standard package installs (npm or go install) for a named package that produces the expected binary; no arbitrary URL downloads or extract steps are present.
Credentials
Only MINERU_TOKEN is required and is justified by the README: flash-extract on .docx is tokenless while full .doc extraction requires authentication. No unrelated secrets or multiple credentials are requested.
Persistence & Privilege
Skill does not request always:true, does not modify other skills, and has normal autonomous-invocation defaults. It does not request elevated or persistent system privileges.
Scan Findings in Context
[no-findings] expected: No code files present; the regex-based scanner had nothing to analyze. This is expected for an instruction-only skill that delegates work to an external CLI.
Assessment
This skill appears to do what it claims: it invokes the MinerU CLI to extract Word content. Before installing, verify the mineru-open-api npm/go package and the homepage (https://mineru.net) are legitimate and up-to-date. Provide MINERU_TOKEN only if you need full .doc extraction; avoid using a high-privilege or shared token. Remember the CLI will read local files you point it at—do not process sensitive documents unless you trust the installed package and the MinerU service.Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
📄 Clawdis
Binsmineru-open-api
EnvMINERU_TOKEN
Primary envMINERU_TOKEN
Install
Install via npm
Bins: mineru-open-api
npm i -g mineru-open-apiInstall via go install
Bins: mineru-open-api
SKILL.md
Doc Extract
Extract text and content from Word (.doc/.docx) files to Markdown using MinerU.
Install
npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
Quick Start
# Quick extraction from .docx (no token required)
mineru-open-api flash-extract report.docx
# Save to directory
mineru-open-api flash-extract report.docx -o ./out/
# Extract .doc file (requires token)
mineru-open-api extract report.doc -o ./out/
# Extract with language hint
mineru-open-api extract report.docx --language en -o ./out/
Authentication
No token needed for flash-extract on .docx. Token required for .doc and extract:
mineru-open-api auth # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable
Create token at: https://mineru.net/apiManage/token
Capabilities
- Supported input: .doc, .docx (local file or URL)
.docx: supportsflash-extract(no token, max 10 MB / 20 pages) andextract.doc: requiresextractwith token- Language hint with
--language(default:ch, useenfor English) - Page range with
--pages(e.g.1-10)
Notes
.docrequiresextractwith token;.docxworks withflash-extractfor quick extraction- Output goes to stdout by default; use
-o <dir>to save to a file or directory - All progress/status messages go to stderr; document content goes to stdout
- MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
