word-document-organizer

智能整理Word文档,自动格式化、生成目录、统一样式,支持学术/商务/极简模板

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 357 · 5 current installs · 6 all-time installs
byJasper.W@okgptai
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name/description (organize Word documents, format, generate TOC, apply templates) matches the instructions: the SKILL.md describes checking a document path, backing up the file, installing python-docx, and running a Python script that edits docx contents. Required capabilities (filesystem access to the provided document path and ability to run python/pip) are consistent with the stated purpose.
Instruction Scope
Instructions stay within the documented purpose (open the provided file path, create a backup, modify the document, save). No steps reference unrelated files, secrets, or external endpoints. Two implementation issues: SKILL.md allows both .docx and .doc in checks, but the code uses python-docx which only supports .docx — .doc files are likely unsupported and will fail. Also the script will create a backup near the original file and, by default, overwrite the original unless an output_path is provided; users should be warned to provide an output_path to avoid data loss.
Install Mechanism
There is no packaged installer; the script runs pip3 install python-docx if needed. This is a common, proportionate install for the stated task. Note: pip installs execute package install-time code and may modify the Python environment; running inside a virtualenv is safer but not required by the skill.
Credentials
The skill requests no environment variables, credentials, or config paths. It only needs read/write access to the document path you supply and to be able to run python/pip — this is proportional to a document-organizing tool.
Persistence & Privilege
The skill does not request always: true and has no install artifacts beyond running pip. It does not modify other skills or system-wide agent configuration. Its runtime actions are limited to local file operations and installing a Python package if absent.
Assessment
This skill is instruction-only and edits local files: only provide paths to documents you trust and keep a separate copy. It will run a pip3 install for python-docx (which can execute code at install time) and will create a timestamped backup next to the original file but will otherwise overwrite the original unless you specify output_path — so specify an output_path or test on a copy first. Note the doc/.docx inconsistency: .doc (old binary Word) is likely unsupported by python-docx, so use .docx files. If you want extra safety, run the commands in a disposable environment (virtualenv or container) and inspect the SKILL.md before use.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1
Download zip
automationvk977pmzmjzp5zrzj99jk7av1s182n4asdocumentvk977pmzmjzp5zrzj99jk7av1s182n4aslatestvk971gjh2fj3shwxnf36b85by0n82nebxmicrosoft-officevk977pmzmjzp5zrzj99jk7av1s182n4asproductivityvk977pmzmjzp5zrzj99jk7av1s182n4aswordvk977pmzmjzp5zrzj99jk7av1s182n4as

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Word 文档整理助手

智能识别文档结构,自动应用标准化排版,生成专业目录,让文档整理一键完成。

触发条件

当用户提出以下请求时激活此技能:

  • "整理word文档"
  • "格式化文档"
  • "生成文档目录"
  • "统一文档样式"
  • "排版优化"
  • "修复word格式"
  • "规范文档格式"
  • "word排版"

参数定义

document_path(必需)

  • 类型: string
  • 描述: Word文档的完整路径(支持.docx格式)
  • 示例: "C:/Users/Desktop/报告.docx", "/home/user/docs/论文.docx"

operations(可选,默认: all)

  • 类型: array[string]
  • 描述: 要执行的操作列表
  • 可选值:
    • format: 格式化段落(行距、间距、字体统一)
    • toc: 生成文档目录
    • styles: 应用标准样式模板
    • cleanup: 清理空段落和冗余格式
    • all: 执行所有操作(默认)

style_template(可选,默认: academic)

  • 类型: string
  • 描述: 样式模板类型
  • 可选值:
    • academic: 学术模板(宋体、层级分明,适合论文/报告)
    • business: 商务模板(微软雅黑、现代简洁,适合商业文档)
    • minimal: 极简模板(Arial、紧凑排版,适合笔记/草稿)
    • default: 默认模板(通用设置)

output_path(可选)

  • 类型: string
  • 描述: 输出文件路径(默认覆盖原文件,建议指定新路径保留原文件)
  • 示例: "C:/Users/Desktop/报告_整理版.docx"

执行流程

步骤1: 环境检查与备份

#!/bin/bash
# 检查文件存在性
if [ ! -f "${document_path}" ]; then
    echo "错误:文件不存在 ${document_path}"
    echo "请检查路径是否正确,或文件是否被移动/删除"
    exit 1
fi

# 检查文件扩展名
if [[ ! "${document_path}" =~ \.(docx|doc)$ ]]; then
    echo "错误:仅支持 .docx 或 .doc 格式"
    exit 1
fi

# 创建时间戳备份
backup_path="${document_path}.backup.$(date +%Y%m%d_%H%M%S)"
cp "${document_path}" "${backup_path}"
echo "已创建备份: ${backup_path}"

步骤2: 安装Python依赖

#!/bin/bash
# 检查python-docx是否已安装
python3 -c "import docx" 2>/dev/null || pip3 install python-docx -q
echo "依赖检查完成"

步骤3: 执行文档整理(核心Python脚本)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from docx import Document
from docx.shared import Pt, Inches, RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.oxml.ns import qn
import sys
import re
import os

# 获取参数
doc_path = "${document_path}"
output_path = "${output_path}" if "${output_path}" else doc_path
template = "${style_template}"
operations = [op.strip() for op in "${operations}".split(",")] if "${operations}" else ["all"]

print(f"处理文件: {doc_path}")
print(f"执行操作: {operations}")
print(f"使用模板: {template}")

# 加载文档
try:
    doc = Document(doc_path)
except Exception as e:
    print(f"无法打开文档: {e}")
    print("提示:请确保文档未被Microsoft Word占用(关闭Word后重试)")
    sys.exit(1)

# 模板配置
TEMPLATES = {
    "academic": {
        "h1_size": 18, "h2_size": 16, "h3_size": 14, "body_size": 12,
        "h1_font": "黑体", "h2_font": "黑体", "body_font": "宋体",
        "line_spacing": 1.5, "space_after": 6
    },
    "business": {
        "h1_size": 16, "h2_size": 14, "h3_size": 12, "body_size": 11,
        "h1_font": "微软雅黑", "h2_font": "微软雅黑", "body_font": "微软雅黑",
        "line_spacing": 1.5, "space_after": 6
    },
    "minimal": {
        "h1_size": 14, "h2_size": 12, "h3_size": 11, "body_size": 10.5,
        "h1_font": "Arial", "h2_font": "Arial", "body_font": "Arial",
        "line_spacing": 1.15, "space_after": 3
    },
    "default": {
        "h1_size": 16, "h2_size": 14, "h3_size": 12, "body_size": 12,
        "h1_font": "宋体", "h2_font": "宋体", "body_font": "宋体",
        "line_spacing": 1.5, "space_after": 6
    }
}

config = TEMPLATES.get(template, TEMPLATES["academic"])
changes_log = []

# 操作1: 格式化
if "format" in operations or "all" in operations:
    print("正在格式化文档...")
    count = 0
    for para in doc.paragraphs:
        para.paragraph_format.line_spacing = config["line_spacing"]
        para.paragraph_format.space_after = Pt(config["space_after"])
        para.paragraph_format.space_before = Pt(0)

        if not para.style.name.startswith('Heading'):
            for run in para.runs:
                run.font.name = config["body_font"]
                run._element.rPr.rFonts.set(qn('w:eastAsia'), config["body_font"])
                run.font.size = Pt(config["body_size"])
        count += 1

    changes_log.append(f"格式化 {count} 个段落")

# 操作2: 应用样式
if "styles" in operations or "all" in operations:
    print("正在应用样式模板...")
    title_count = 0

    for para in doc.paragraphs:
        text = para.text.strip()
        if not text:
            continue

        if re.match(r'^[第][一二三四五六七八九十\d]+[章\s]|^[\d]+\s*[、..\s]|^[((][一二三四五六七八九十]+[))]', text):
            para.style = doc.styles['Heading 1']
            for run in para.runs:
                run.font.name = config["h1_font"]
                run._element.rPr.rFonts.set(qn('w:eastAsia'), config["h1_font"])
                run.font.size = Pt(config["h1_size"])
                run.font.bold = True
                run.font.color.rgb = RGBColor(0, 0, 0)
            title_count += 1

        elif re.match(r'^\d+\.\d+[\s.、]|^[((][\d一二三四五六七八九十]+[))]', text):
            para.style = doc.styles['Heading 2']
            for run in para.runs:
                run.font.name = config["h2_font"]
                run._element.rPr.rFonts.set(qn('w:eastAsia'), config["h2_font"])
                run.font.size = Pt(config["h2_size"])
                run.font.bold = True
            title_count += 1

        elif re.match(r'^\d+\.\d+\.\d+|^[((]\d+[))]', text):
            para.style = doc.styles['Heading 3']
            for run in para.runs:
                run.font.size = Pt(config["h3_size"])
                run.font.bold = True
            title_count += 1

    changes_log.append(f"识别并格式化 {title_count} 个标题")

# 操作3: 生成目录
if "toc" in operations or "all" in operations:
    print("正在生成目录...")

    toc_entries = []
    for para in doc.paragraphs:
        if para.style.name.startswith('Heading'):
            level = int(para.style.name[-1]) if para.style.name[-1].isdigit() else 1
            toc_entries.append((level, para.text.strip()))

    if toc_entries:
        first_para = doc.paragraphs[0]
        toc_title = first_para.insert_paragraph_before("目录")
        toc_title.alignment = WD_ALIGN_PARAGRAPH.CENTER
        for run in toc_title.runs:
            run.font.size = Pt(config["h1_size"])
            run.font.bold = True
            run.font.name = config["h1_font"]
            run._element.rPr.rFonts.set(qn('w:eastAsia'), config["h1_font"])

        for level, text in toc_entries[:50]:
            indent = "    " * (level - 1)
            entry_para = first_para.insert_paragraph_before(f"{indent}{text}")
            entry_para.paragraph_format.left_indent = Inches(0.3 * (level - 1))
            for run in entry_para.runs:
                run.font.size = Pt(config["body_size"])

        separator = first_para.insert_paragraph_before("—" * 30)
        separator.alignment = WD_ALIGN_PARAGRAPH.CENTER

        changes_log.append(f"生成目录,包含 {len(toc_entries)} 个条目")
    else:
        changes_log.append("未检测到标题结构,跳过目录生成")

# 操作4: 清理
if "cleanup" in operations or "all" in operations:
    print("正在清理冗余内容...")
    removed = 0
    prev_empty = False

    for i in range(len(doc.paragraphs) - 1, -1, -1):
        para = doc.paragraphs[i]
        is_empty = not para.text.strip()

        if is_empty and prev_empty:
            p_element = para._element
            p_element.getparent().remove(p_element)
            removed += 1
        prev_empty = is_empty

    changes_log.append(f"删除 {removed} 个冗余空段落")

# 保存文档
try:
    doc.save(output_path)
    print(f"文档已保存: {output_path}")
except Exception as e:
    print(f"保存失败: {e}")
    sys.exit(1)

print("
" + "="*50)
print("整理报告")
print("="*50)
for log in changes_log:
    print(log)
print("="*50)
print("文档整理完成!")

步骤4: 验证输出

#!/bin/bash
if [ -f "${output_path}" ]; then
    file_size=$(ls -lh "${output_path}" | awk '{print $5}')
    echo "输出文件: ${output_path} (${file_size})"
    echo "提示:原文件已备份,如有问题可恢复"
else
    echo "错误:输出文件未生成"
    exit 1
fi

使用示例

示例1: 全面整理(推荐)

整理文档 C:/Users/Desktop/毕业论文.docx,使用学术模板,执行所有操作,输出到 C:/Users/Desktop/毕业论文_整理版.docx

示例2: 仅格式化和生成目录

格式化 C:/Docs/报告.docx,操作包括format,toc,模板用business

错误处理

错误场景处理方式
文件不存在提示检查路径,退出码1
文件格式不支持提示仅支持docx/doc,退出码1
文件被Word占用提示关闭Word后重试,退出码1
权限不足提示以管理员身份运行,退出码1

安全说明

  • 本地执行: 所有操作在本地完成,不上传任何文件到云端
  • 自动备份: 每次执行自动创建带时间戳的备份
  • 原子操作: 失败时不会损坏原文件(先处理再保存)
  • 权限最小化: 仅需文件读写权限,无需网络或系统管理权限

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…