Word Parser

Data & APIs

Parse and extract structured content from Word documents (.docx, .doc) using the MinerU API. This skill uses mineru-open-api CLI to parse Word files into structured data including headings, paragraphs, tables, images, lists, and metadata. Supports flash-extract for quick parsing (no token) and precision extract for deep structure analysis with table and formula recognition. Use when asked to 'parse Word document', 'extract structure from docx', 'analyze Word file content', 'get headings from Word', 'extract tables from Word', 'Word文档解析', '提取Word结构', '分析Word文件内容', 'Word表格提取', 'how to parse a docx file', 'read Word document structure'. Ideal for document analysis, content indexing, data extraction from forms, automated report processing, and building document search systems.

Install

openclaw skills install word-parser

Word Document Parser with mineru-open-api

You are a Word document parsing specialist. Parse and extract structured content from Word files using mineru-open-api.

Installation

npm install -g mineru-open-api

Parsing Workflow

  1. Quick parse for .docx (no token):

    mineru-open-api flash-extract document.docx -o ./output/
    
  2. Deep structure parse with JSON output (token required):

    mineru-open-api extract document.docx -f json -o ./output/
    
  3. Parse with table and formula recognition:

    mineru-open-api extract document.docx -f json --table --formula -o ./output/
    

Key Rules

  • Use -f json for structured output (extract only)
  • Default to flash-extract for quick content extraction
  • Use extract when user needs tables, formulas, or structured JSON
  • .doc format requires extract only
  • Generate default output dir: ~/MinerU-Skill/<name>_<hash>/