Doc To Text

v0.4.0

Extract plain readable text from Word documents (.doc, .docx) using MinerU. Outputs Markdown (the closest plain-text format supported) for easy reading and p...

0· 207· 6 versions· 0 current· 0 all-time· Updated 10h ago· MIT-0

Install

openclaw skills install doc-to-text

Doc To Text

Extract plain readable text from Word (.doc/.docx) documents using MinerU. MinerU outputs Markdown, which is the closest format to plain text it supports.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract text from .docx to stdout (no token required)
mineru-open-api flash-extract report.docx

# Save to file
mineru-open-api flash-extract report.docx -o ./out/

# Extract .doc (requires token)
mineru-open-api extract report.doc -o ./out/

# JSON output contains plain text fields (requires token)
mineru-open-api extract report.docx -f json -o ./out/

Authentication

No token needed for flash-extract on .docx. Token required for .doc and extract:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: .doc, .docx (local file or URL)
  • .docx: supports flash-extract (no token, Markdown output to stdout)
  • .doc: requires extract with token
  • For truly plain text: use extract -f json and read the text fields from the JSON output
  • Language hint with --language (default: ch, use en for English)

Notes

  • MinerU does not have a -f text option; Markdown is the closest to plain text
  • .doc requires extract with token; .docx works with flash-extract
  • Output goes to stdout by default; use -o <dir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Version tags

latestvk975kcc6a3aesmf15a6xs39ez18444gj

Runtime requirements

📄 Clawdis
Binsmineru-open-api
EnvMINERU_TOKEN
Primary envMINERU_TOKEN

Install

Install via npm
Bins: mineru-open-api
npm i -g mineru-open-api
Install via go install
Bins: mineru-open-api