Install
openclaw skills install doc-extractExtract text and content from Word documents (.doc, .docx) to Markdown using MinerU. A straightforward tool for reading and extracting Word file content. Features: fast text extraction from .docx with no token required (flash-extract). Full extraction for both .doc and .docx with token. Preserves basic formatting and structure. Page range selection for large documents. Use when you need to: extract text from a Word file, read content from .doc or .docx, pull text out of a Word document, get the content of a Word file as Markdown. Use when asked: 'how do I extract text from Word', 'read this docx file', 'I want the text from this Word document', 'can my agent read Word files', 'is there a skill that extracts Word content'. Built on MinerU by OpenDataLab (Shanghai AI Lab), an open-source document intelligence engine. Handles multilingual content. Works with local files and URLs. Great for developers, assistants, and automation workflows that need to quickly extract and process Word document content.
openclaw skills install doc-extractExtract text and content from Word (.doc/.docx) files to Markdown using MinerU.
npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
# Quick extraction from .docx (no token required)
mineru-open-api flash-extract report.docx
# Save to directory
mineru-open-api flash-extract report.docx -o ./out/
# Extract .doc file (requires token)
mineru-open-api extract report.doc -o ./out/
# Extract with language hint
mineru-open-api extract report.docx --language en -o ./out/
No token needed for flash-extract on .docx. Token required for .doc and extract:
mineru-open-api auth # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable
Create token at: https://mineru.net/apiManage/token
.docx: supports flash-extract (no token, max 10 MB / 20 pages) and extract.doc: requires extract with token--language (default: ch, use en for English)--pages (e.g. 1-10).doc requires extract with token; .docx works with flash-extract for quick extraction-o <dir> to save to a file or directory