Document Pro
文档处理技能 - 让 AI 能够读取、解析、提取 PDF、DOCX、PPT 等文档的关键信息。当用户要求分析文档、提取内容、总结报告时触发此技能。
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 7 · 4.4k · 46 current installs · 46 all-time installs
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description map to reading/parsing PDFs, DOCX, PPTX, XLSX and the SKILL.md shows use of appropriate Python libraries (pdfplumber, PyPDF2, python-docx, python-pptx, openpyxl). The requested capabilities align with the described purpose.
Instruction Scope
Instructions are limited to opening and extracting from document files (e.g., document.pdf, document.docx). They do not instruct reading unrelated system files or environment variables. Note: OCR is mentioned as a limitation but no OCR tool (e.g., tesseract/pytesseract or an OCR service) is specified — this is an operational gap, not necessarily malicious.
Install Mechanism
No install spec / no downloads. The skill is instruction-only, so nothing will be written to disk by an installer. Runtime requires Python libraries but installation is left to the environment.
Credentials
The skill declares no environment variables, credentials, or config paths. The lack of requested secrets is appropriate for a document-processing skill.
Persistence & Privilege
always:false and default invocation settings are used. The skill does not request persistent or elevated privileges or modify other skills; autonomous invocation is allowed (platform default).
Assessment
This skill appears internally consistent for document processing, but consider the following before installing: 1) Provenance: source and homepage are unknown — prefer skills with a clear publisher or repo. 2) Runtime dependencies: it references Python libraries (pdfplumber, PyPDF2, python-docx, python-pptx, openpyxl) and mentions OCR without specifying tools; you must ensure the runtime environment has those packages (and an OCR engine if you need scanned-PDF support). 3) Data sensitivity: the skill reads document contents — avoid sending highly sensitive documents unless you trust the environment and the skill's origin. 4) Operational gaps: the SKILL.md is instructional only (no code files); the agent may attempt to run similar Python code if permitted — verify execution policies and available tooling. If you need stronger assurance, ask the publisher for a source repository or an install script and a list of runtime dependencies and OCR tooling.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Document Pro - 文档处理技能
概述
赋予 AI 强大的文档处理能力:
- PDF 读取与提取
- Word 文档解析
- PowerPoint 提取
- Excel 数据提取
- 文档格式转换
触发场景
- 用户发送文档并要求"分析"、"总结"
- 用户要求"提取文档内容"
- 用户要求"转换成 PDF"
- 用户询问文档中的具体信息
- 用户要求"从报告/论文中提取要点"
支持的格式
| 格式 | 读取 | 写入 | 工具 |
|---|---|---|---|
| ✅ | ✅ | pdfplumber, PyPDF2 | |
| DOCX | ✅ | ✅ | python-docx |
| PPTX | ✅ | ❌ | python-pptx |
| XLSX | ✅ | ✅ | openpyxl |
| TXT | ✅ | ✅ | 内置 |
| Markdown | ✅ | ✅ | 内置 |
工具使用
PDF 处理
# 提取文本
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
print(text)
# 提取表格
with pdfplumber.open("document.pdf") as pdf:
table = pdf.pages[0].extract_tables()
Word 文档
from docx import Document
doc = Document("document.docx")
for para in doc.paragraphs:
print(para.text)
# 提取表格
for table in doc.tables:
for row in table.rows:
print([cell.text for cell in row.cells])
PowerPoint
from pptx import Presentation
prs = Presentation("presentation.pptx")
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_text_frame:
print(shape.text)
工作流
1. 识别文档类型 → 选择正确的工具
2. 读取内容 → 提取文本、表格、图片
3. 分析信息 → 理解结构、提取要点
4. 总结呈现 → 用中文总结给用户
进阶功能
文档摘要
- 提取文档主要观点
- 生成简短摘要
- 列出关键要点
表格处理
- 识别表格结构
- 提取表格数据
- 转换为 CSV/Excel
关键词提取
- 找出重要名词/术语
- 识别主题
- 提取关键信息
输出格式
向用户呈现文档时:
- 文档类型和页数
- 主要内容摘要
- 关键要点(3-5条)
- 建议的后续操作
限制
- 扫描版 PDF 需要 OCR
- 复杂格式可能丢失
- 图片/图表无法完全理解
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
