Document Pro

文档处理技能 - 让 AI 能够读取、解析、提取 PDF、DOCX、PPT 等文档的关键信息。当用户要求分析文档、提取内容、总结报告时触发此技能。

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 7 · 4.4k · 46 current installs · 46 all-time installs

by@Jackeven02

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description map to reading/parsing PDFs, DOCX, PPTX, XLSX and the SKILL.md shows use of appropriate Python libraries (pdfplumber, PyPDF2, python-docx, python-pptx, openpyxl). The requested capabilities align with the described purpose.

ℹ

Instruction Scope

Instructions are limited to opening and extracting from document files (e.g., document.pdf, document.docx). They do not instruct reading unrelated system files or environment variables. Note: OCR is mentioned as a limitation but no OCR tool (e.g., tesseract/pytesseract or an OCR service) is specified — this is an operational gap, not necessarily malicious.

✓

Install Mechanism

No install spec / no downloads. The skill is instruction-only, so nothing will be written to disk by an installer. Runtime requires Python libraries but installation is left to the environment.

✓

Credentials

The skill declares no environment variables, credentials, or config paths. The lack of requested secrets is appropriate for a document-processing skill.

✓

Persistence & Privilege

always:false and default invocation settings are used. The skill does not request persistent or elevated privileges or modify other skills; autonomous invocation is allowed (platform default).

Assessment

This skill appears internally consistent for document processing, but consider the following before installing: 1) Provenance: source and homepage are unknown — prefer skills with a clear publisher or repo. 2) Runtime dependencies: it references Python libraries (pdfplumber, PyPDF2, python-docx, python-pptx, openpyxl) and mentions OCR without specifying tools; you must ensure the runtime environment has those packages (and an OCR engine if you need scanned-PDF support). 3) Data sensitivity: the skill reads document contents — avoid sending highly sensitive documents unless you trust the environment and the skill's origin. 4) Operational gaps: the SKILL.md is instructional only (no code files); the agent may attempt to run similar Python code if permitted — verify execution policies and available tooling. If you need stronger assurance, ask the publisher for a source repository or an install script and a list of runtime dependencies and OCR tooling.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk978h88wnp6vkf6bhpyeqegm4181rdb8

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Document Pro - 文档处理技能

概述

赋予 AI 强大的文档处理能力：

PDF 读取与提取
Word 文档解析
PowerPoint 提取
Excel 数据提取
文档格式转换

触发场景

用户发送文档并要求"分析"、"总结"
用户要求"提取文档内容"
用户要求"转换成 PDF"
用户询问文档中的具体信息
用户要求"从报告/论文中提取要点"

支持的格式

格式	读取	写入	工具
PDF	✅	✅	pdfplumber, PyPDF2
DOCX	✅	✅	python-docx
PPTX	✅	❌	python-pptx
XLSX	✅	✅	openpyxl
TXT	✅	✅	内置
Markdown	✅	✅	内置

工具使用

PDF 处理

# 提取文本
import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        print(text)

# 提取表格
with pdfplumber.open("document.pdf") as pdf:
    table = pdf.pages[0].extract_tables()

Word 文档

from docx import Document

doc = Document("document.docx")
for para in doc.paragraphs:
    print(para.text)

# 提取表格
for table in doc.tables:
    for row in table.rows:
        print([cell.text for cell in row.cells])

PowerPoint

from pptx import Presentation

prs = Presentation("presentation.pptx")
for slide in prs.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            print(shape.text)

工作流

1. 识别文档类型 → 选择正确的工具
2. 读取内容 → 提取文本、表格、图片
3. 分析信息 → 理解结构、提取要点
4. 总结呈现 → 用中文总结给用户

进阶功能

文档摘要

提取文档主要观点
生成简短摘要
列出关键要点

表格处理

识别表格结构
提取表格数据
转换为 CSV/Excel

关键词提取

找出重要名词/术语
识别主题
提取关键信息

输出格式

向用户呈现文档时：

文档类型和页数
主要内容摘要
关键要点（3-5条）
建议的后续操作

限制

扫描版 PDF 需要 OCR
复杂格式可能丢失
图片/图表无法完全理解

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…