Install
openclaw skills install gi-excel-pdf-processProcess Excel and PDF files - extract data, parse tables, generate reports. Use when working with .xlsx, .xls, .csv, .pdf files, or when the user mentions spreadsheet, PDF extraction, or report generation.
openclaw skills install gi-excel-pdf-process处理 Excel 与 PDF 文件:提取数据、解析表格、生成报告。适用于数据导入导出、报表生成、文档解析等场景。
.xlsx、.xls、.csv、.pdf 文件可执行脚本:scripts/excel_extract.py(Excel→CSV)、scripts/pdf_extract.py(PDF 文本/表格提取),依赖见 scripts/requirements.txt。
import pandas as pd
# 读取整个文件
df = pd.read_excel("file.xlsx", sheet_name=0) # 第一个 sheet
# 指定 sheet
df = pd.read_excel("file.xlsx", sheet_name="Sheet1")
# 读取 CSV
df = pd.read_csv("file.csv", encoding="utf-8")
# 单 sheet
df.to_excel("output.xlsx", index=False)
# 多 sheet
with pd.ExcelWriter("output.xlsx") as writer:
df1.to_excel(writer, sheet_name="汇总", index=False)
df2.to_excel(writer, sheet_name="明细", index=False)
df[df['列名'] > 0]df.drop_duplicates(subset=['列名'])pd.concat([df1, df2]) 或 pd.merge(df1, df2, on='key')df.pivot_table(values='val', index='row', columns='col', aggfunc='sum')pip install pandas openpyxl # xlsx 需要 openpyxl
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
if text:
print(text)
with pdfplumber.open("file.pdf") as pdf:
page = pdf.pages[0]
tables = page.extract_tables()
for table in tables:
# table 为二维列表
for row in table:
print(row)
pip install pdfplumber
若需 OCR(扫描版 PDF):pip install pdf2image pytesseract,并安装 Tesseract。
df.to_excel()reportlab 或先生成 Excel 再转 PDFutf-8、gbk,先尝试 utf-8df.fillna(0) 或 df.dropna() 按需处理pd.to_datetime(df['date_col']) 统一格式