word-document-organizer

智能整理Word文档，自动格式化、生成目录、统一样式，支持学术/商务/极简模板

Install

openclaw skills install word-document-organizer

Word 文档整理助手

智能识别文档结构，自动应用标准化排版，生成专业目录，让文档整理一键完成。

触发条件

当用户提出以下请求时激活此技能：

"整理word文档"
"格式化文档"
"生成文档目录"
"统一文档样式"
"排版优化"
"修复word格式"
"规范文档格式"
"word排版"

参数定义

document_path（必需）

类型: string
描述: Word文档的完整路径（支持.docx格式）
示例: "C:/Users/Desktop/报告.docx", "/home/user/docs/论文.docx"

operations（可选，默认: all）

类型: array[string]
描述: 要执行的操作列表
可选值:
- format: 格式化段落（行距、间距、字体统一）
- toc: 生成文档目录
- styles: 应用标准样式模板
- cleanup: 清理空段落和冗余格式
- all: 执行所有操作（默认）

style_template（可选，默认: academic）

类型: string
描述: 样式模板类型
可选值:
- academic: 学术模板（宋体、层级分明，适合论文/报告）
- business: 商务模板（微软雅黑、现代简洁，适合商业文档）
- minimal: 极简模板（Arial、紧凑排版，适合笔记/草稿）
- default: 默认模板（通用设置）

output_path（可选）

类型: string
描述: 输出文件路径（默认覆盖原文件，建议指定新路径保留原文件）
示例: "C:/Users/Desktop/报告_整理版.docx"

执行流程

步骤1: 环境检查与备份

#!/bin/bash
# 检查文件存在性
if [ ! -f "${document_path}" ]; then
    echo "错误：文件不存在 ${document_path}"
    echo "请检查路径是否正确，或文件是否被移动/删除"
    exit 1
fi

# 检查文件扩展名
if [[ ! "${document_path}" =~ \.(docx|doc)$ ]]; then
    echo "错误：仅支持 .docx 或 .doc 格式"
    exit 1
fi

# 创建时间戳备份
backup_path="${document_path}.backup.$(date +%Y%m%d_%H%M%S)"
cp "${document_path}" "${backup_path}"
echo "已创建备份: ${backup_path}"

步骤2: 安装Python依赖

#!/bin/bash
# 检查python-docx是否已安装
python3 -c "import docx" 2>/dev/null || pip3 install python-docx -q
echo "依赖检查完成"

步骤3: 执行文档整理（核心Python脚本）

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from docx import Document
from docx.shared import Pt, Inches, RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.oxml.ns import qn
import sys
import re
import os

# 获取参数
doc_path = "${document_path}"
output_path = "${output_path}" if "${output_path}" else doc_path
template = "${style_template}"
operations = [op.strip() for op in "${operations}".split(",")] if "${operations}" else ["all"]

print(f"处理文件: {doc_path}")
print(f"执行操作: {operations}")
print(f"使用模板: {template}")

# 加载文档
try:
    doc = Document(doc_path)
except Exception as e:
    print(f"无法打开文档: {e}")
    print("提示：请确保文档未被Microsoft Word占用（关闭Word后重试）")
    sys.exit(1)

# 模板配置
TEMPLATES = {
    "academic": {
        "h1_size": 18, "h2_size": 16, "h3_size": 14, "body_size": 12,
        "h1_font": "黑体", "h2_font": "黑体", "body_font": "宋体",
        "line_spacing": 1.5, "space_after": 6
    },
    "business": {
        "h1_size": 16, "h2_size": 14, "h3_size": 12, "body_size": 11,
        "h1_font": "微软雅黑", "h2_font": "微软雅黑", "body_font": "微软雅黑",
        "line_spacing": 1.5, "space_after": 6
    },
    "minimal": {
        "h1_size": 14, "h2_size": 12, "h3_size": 11, "body_size": 10.5,
        "h1_font": "Arial", "h2_font": "Arial", "body_font": "Arial",
        "line_spacing": 1.15, "space_after": 3
    },
    "default": {
        "h1_size": 16, "h2_size": 14, "h3_size": 12, "body_size": 12,
        "h1_font": "宋体", "h2_font": "宋体", "body_font": "宋体",
        "line_spacing": 1.5, "space_after": 6
    }
}

config = TEMPLATES.get(template, TEMPLATES["academic"])
changes_log = []

# 操作1: 格式化
if "format" in operations or "all" in operations:
    print("正在格式化文档...")
    count = 0
    for para in doc.paragraphs:
        para.paragraph_format.line_spacing = config["line_spacing"]
        para.paragraph_format.space_after = Pt(config["space_after"])
        para.paragraph_format.space_before = Pt(0)

        if not para.style.name.startswith('Heading'):
            for run in para.runs:
                run.font.name = config["body_font"]
                run._element.rPr.rFonts.set(qn('w:eastAsia'), config["body_font"])
                run.font.size = Pt(config["body_size"])
        count += 1

    changes_log.append(f"格式化 {count} 个段落")

# 操作2: 应用样式
if "styles" in operations or "all" in operations:
    print("正在应用样式模板...")
    title_count = 0

    for para in doc.paragraphs:
        text = para.text.strip()
        if not text:
            continue

        if re.match(r'^[第][一二三四五六七八九十\d]+[章\s]|^[\d]+\s*[、.．\s]|^[（(][一二三四五六七八九十]+[)）]', text):
            para.style = doc.styles['Heading 1']
            for run in para.runs:
                run.font.name = config["h1_font"]
                run._element.rPr.rFonts.set(qn('w:eastAsia'), config["h1_font"])
                run.font.size = Pt(config["h1_size"])
                run.font.bold = True
                run.font.color.rgb = RGBColor(0, 0, 0)
            title_count += 1

        elif re.match(r'^\d+\.\d+[\s.、]|^[（(][\d一二三四五六七八九十]+[)）]', text):
            para.style = doc.styles['Heading 2']
            for run in para.runs:
                run.font.name = config["h2_font"]
                run._element.rPr.rFonts.set(qn('w:eastAsia'), config["h2_font"])
                run.font.size = Pt(config["h2_size"])
                run.font.bold = True
            title_count += 1

        elif re.match(r'^\d+\.\d+\.\d+|^[（(]\d+[)）]', text):
            para.style = doc.styles['Heading 3']
            for run in para.runs:
                run.font.size = Pt(config["h3_size"])
                run.font.bold = True
            title_count += 1

    changes_log.append(f"识别并格式化 {title_count} 个标题")

# 操作3: 生成目录
if "toc" in operations or "all" in operations:
    print("正在生成目录...")

    toc_entries = []
    for para in doc.paragraphs:
        if para.style.name.startswith('Heading'):
            level = int(para.style.name[-1]) if para.style.name[-1].isdigit() else 1
            toc_entries.append((level, para.text.strip()))

    if toc_entries:
        first_para = doc.paragraphs[0]
        toc_title = first_para.insert_paragraph_before("目录")
        toc_title.alignment = WD_ALIGN_PARAGRAPH.CENTER
        for run in toc_title.runs:
            run.font.size = Pt(config["h1_size"])
            run.font.bold = True
            run.font.name = config["h1_font"]
            run._element.rPr.rFonts.set(qn('w:eastAsia'), config["h1_font"])

        for level, text in toc_entries[:50]:
            indent = "    " * (level - 1)
            entry_para = first_para.insert_paragraph_before(f"{indent}{text}")
            entry_para.paragraph_format.left_indent = Inches(0.3 * (level - 1))
            for run in entry_para.runs:
                run.font.size = Pt(config["body_size"])

        separator = first_para.insert_paragraph_before("—" * 30)
        separator.alignment = WD_ALIGN_PARAGRAPH.CENTER

        changes_log.append(f"生成目录，包含 {len(toc_entries)} 个条目")
    else:
        changes_log.append("未检测到标题结构，跳过目录生成")

# 操作4: 清理
if "cleanup" in operations or "all" in operations:
    print("正在清理冗余内容...")
    removed = 0
    prev_empty = False

    for i in range(len(doc.paragraphs) - 1, -1, -1):
        para = doc.paragraphs[i]
        is_empty = not para.text.strip()

        if is_empty and prev_empty:
            p_element = para._element
            p_element.getparent().remove(p_element)
            removed += 1
        prev_empty = is_empty

    changes_log.append(f"删除 {removed} 个冗余空段落")

# 保存文档
try:
    doc.save(output_path)
    print(f"文档已保存: {output_path}")
except Exception as e:
    print(f"保存失败: {e}")
    sys.exit(1)

print("
" + "="*50)
print("整理报告")
print("="*50)
for log in changes_log:
    print(log)
print("="*50)
print("文档整理完成！")

步骤4: 验证输出

#!/bin/bash
if [ -f "${output_path}" ]; then
    file_size=$(ls -lh "${output_path}" | awk '{print $5}')
    echo "输出文件: ${output_path} (${file_size})"
    echo "提示：原文件已备份，如有问题可恢复"
else
    echo "错误：输出文件未生成"
    exit 1
fi

使用示例

示例1: 全面整理（推荐）

整理文档 C:/Users/Desktop/毕业论文.docx，使用学术模板，执行所有操作，输出到 C:/Users/Desktop/毕业论文_整理版.docx

示例2: 仅格式化和生成目录

格式化 C:/Docs/报告.docx，操作包括format,toc，模板用business

错误处理

错误场景	处理方式
文件不存在	提示检查路径，退出码1
文件格式不支持	提示仅支持docx/doc，退出码1
文件被Word占用	提示关闭Word后重试，退出码1
权限不足	提示以管理员身份运行，退出码1

安全说明

本地执行: 所有操作在本地完成，不上传任何文件到云端
自动备份: 每次执行自动创建带时间戳的备份
原子操作: 失败时不会损坏原文件（先处理再保存）
权限最小化: 仅需文件读写权限，无需网络或系统管理权限