Install
openclaw skills install openclaw-office-toolkit处理 Office 文档(Word/Excel/PPT/PDF)的技能。当用户要求读取、创建、编辑 Word 文档(.docx)、Excel 表格(.xlsx/.csv)、PPT(.pptx)或 PDF 时使用。基于 python-docx、openpyxl、python-pptx、pypdf 库。Requires: python-docx, openpyxl, python-pptx, pypdf, pandoc, LibreOffice(验证用)。
openclaw skills install openclaw-office-toolkit处理 Office 文档:Word(.docx)、Excel(.xlsx/.csv)、PPT(.pptx)、PDF。
pip install --break-system-packages python-docx openpyxl python-pptx pypdf
sudo apt install libreoffice-writer libreoffice-calc libreoffice-impress pandoc
| 任务 | 库/命令 |
|---|---|
| 读 Word | python-docx 或 pandoc -t markdown |
| 创建/编辑 Word | python-docx |
| 读 Excel | openpyxl 或 pandas |
| 创建/编辑 Excel | openpyxl |
| 读 PPT | python-pptx |
| 创建/编辑 PPT | python-pptx |
| 读 PDF | pypdf 或 pandoc |
| PDF 格式验证 | LibreOffice soffice |
| PDF 转图片 | pdftoppm (poppler-utils) |
from docx import Document
doc = Document('file.docx')
for para in doc.paragraphs:
print(para.text)
# 带格式提取
import subprocess
result = subprocess.run(['pandoc', '--track-changes=all', 'file.docx', '-t', 'markdown'],
capture_output=True, text=True)
print(result.stdout)
from docx import Document
from docx.shared import Pt, Inches
doc = Document()
# 标题
doc.add_heading('文档标题', 0)
# 段落
p = doc.add_paragraph('正文内容')
p.runs[0].bold = True # 加粗
p.runs[0].font.size = Pt(12)
p.runs[0].font.name = 'Arial'
# 引用
doc.add_paragraph('引用内容', style='Intense Quote')
# 表格
table = doc.add_table(rows=2, cols=3)
table.style = 'Light Grid Accent 1'
table.rows[0].cells[0].text = '表头1'
table.rows[0].cells[1].text = '表头2'
doc.save('output.docx')
import openpyxl
wb = openpyxl.load_workbook('file.xlsx')
ws = wb.active
for row in ws.iter_rows(values_only=True):
print(row)
import openpyxl
from openpyxl.styles import Font, PatternFill, Alignment
wb = openpyxl.Workbook()
ws = wb.active
ws.title = '数据'
# 写入
ws['A1'] = '姓名'
ws['B1'] = '年龄'
ws['A2'] = '张三'
ws['B2'] = 25
# 格式化
header_fill = PatternFill(start_color='4472C4', end_color='4472C4', fill_type='solid')
ws['A1'].fill = header_fill
ws['A1'].font = Font(color='FFFFFF', bold=True)
ws['A1'].alignment = Alignment(horizontal='center')
# 保存
wb.save('output.xlsx')
from pptx import Presentation
prs = Presentation('file.pptx')
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, 'text'):
print(shape.text)
from pptx import Presentation
from pptx.util import Inches, Pt
prs = Presentation()
prs.slide_width = Inches(10)
prs.slide_height = Inches(7.5)
# 使用空白布局
slide = prs.slides.add_slide(prs.slide_layouts[6])
title = slide.shapes.title
subtitle = slide.placeholders[1]
title.text = '演示标题'
subtitle.text = '副标题'
prs.save('output.pptx')
from pypdf import PdfReader
reader = PdfReader('file.pdf')
print(f'页数: {len(reader.pages)}')
for page in reader.pages:
print(page.extract_text())
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
for pdf_file in ['doc1.pdf', 'doc2.pdf']:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)
with open('merged.pdf', 'wb') as f:
writer.write(f)
reader = PdfReader('input.pdf')
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f'page_{i+1}.pdf', 'wb') as f:
writer.write(f)
page = reader.pages[0]
page.rotate(90) # 顺时针90度
| 依赖 | 状态 |
|---|---|
| python-docx | ✅ 已安装 |
| openpyxl | ✅ 已安装 |
| python-pptx | ✅ 已安装 |
| pypdf | ✅ 已安装 |
| pandoc | ✅ 已安装 |
| LibreOffice | ✅ 已安装 |
pypdf,文字提取效果差时用 pandocsoffice 或 libreoffice(已安装)pathlib.Path(path).exists()