Install
openclaw skills install docx-toolkit-zhouliExtract text, tables, and images from .docx and legacy .doc files. Handles large documents, CJK text, and complex table structures. Includes deduplication and filtering for extracted images.
openclaw skills install docx-toolkit-zhouliA complete toolkit for processing Microsoft Word documents (.docx and legacy .doc formats).
python3 {baseDir}/scripts/extract_text.py input.docx output.txt
Extracts all paragraphs and tables with structure preserved. Tables are formatted as pipe-delimited rows for easy parsing.
python3 {baseDir}/scripts/extract_doc_text.py input.doc output.txt
Handles legacy OLE2 .doc format using olefile. Extracts Unicode text from the WordDocument stream.
python3 {baseDir}/scripts/extract_images.py input.docx output_dir/
Extracts all embedded images with:
python3 {baseDir}/scripts/resize_images.py input_dir/ output_dir/ [--max-width 1024]
Batch resize/compress images for API processing (saves 50-70% on vision API costs).
python-docx — for .docx processingolefile — for legacy .doc processingPillow — for image resizing (optional, only needed for resize script)Install:
pip3 install python-docx olefile Pillow