pdf-processor

v1.0.0

一站式 PDF 处理技能。支持 PDF 文本/图片/表格提取、格式转换(PDF↔Word/Excel)、合并拆分、OCR 识别、批量处理、水印添加、加密解密、压缩等。使用场景: (1) 从 PDF 提取文本内容进行数据分析 (2) 将 PDF 转换为 Word/Excel 方便编辑 (3) 合并或拆分 PDF 文...

1· 37·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for pengsc1994/free-pdf-processor.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "pdf-processor" (pengsc1994/free-pdf-processor) from ClawHub.
Skill page: https://clawhub.ai/pengsc1994/free-pdf-processor
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install free-pdf-processor

ClawHub CLI

Package manager switcher

npx clawhub@latest install free-pdf-processor
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description describe PDF extraction, conversion, OCR, watermarking, encryption, compression, merging/splitting and the repository contains scripts that implement those features. Declared dependencies (pymupdf, pdfplumber, python-docx, openpyxl, Pillow, optional pytesseract) align with the functionality; there are no unrelated credentials, binaries, or config paths requested.
Instruction Scope
SKILL.md only instructs installing Python deps and running the included scripts. The runtime instructions do not direct the agent to read unrelated system files, environment variables, or to send data to external endpoints. Note: SKILL.md uses a Windows-like example path (D:\PDF.skill\...) which is just an example and not a request for system-wide access.
Install Mechanism
There is no automated install spec (instruction-only). Dependencies are installed via pip from requirements.txt (PyPI). The listed packages are common, well-known libraries for PDF/image/office processing. No downloads from arbitrary URLs or archive extraction are present.
Credentials
The skill requests no environment variables or credentials. The code performs only local file I/O and local dependency checks (e.g., searching for Tesseract executable paths); it does not attempt to read unrelated environment secrets or external config.
Persistence & Privilege
The skill is not set to always:true and does not modify other skills or global agent settings. It has no mechanism to persistently install itself into a platform or exfiltrate configuration.
Assessment
This package appears internally consistent with a local PDF utility. Before installing or running it: (1) review the scripts yourself or run them in a sandbox/VM—these are local Python scripts and will read and write files you pass to them; (2) pip will install packages from PyPI—only proceed if you trust those packages or inspect requirements.txt (they are common PDF libraries here); (3) OCR requires the Tesseract engine (you must install it separately); (4) note minor privacy issue: encrypt_pdf prints the password to stdout, so avoid exposing sensitive passwords to shared consoles or logs; (5) some scripts have small implementation issues (e.g., missing imports in places) but those are functional bugs, not signs of malicious intent. If you process sensitive documents, run the tools on an isolated machine and verify outputs before sharing.

Like a lobster shell, security has layers — review code before you run it.

latestvk973zkq494vr5kv3a2jn7cpna185nnms
37downloads
1stars
1versions
Updated 1d ago
v1.0.0
MIT-0

PDF 处理技能

快速开始

安装依赖

cd D:\PDF.skill\pdf-processor
pip install -r requirements.txt

核心功能

功能命令说明
提取文本python scripts/extract_text.py <pdf_path>提取 PDF 文本内容
提取图片python scripts/extract_images.py <pdf_path> <output_dir>提取 PDF 中的图片
提取表格python scripts/extract_tables.py <pdf_path>提取 PDF 中的表格
PDF 转 Wordpython scripts/pdf_to_word.py <pdf_path> <output_path>转换为可编辑 Word
PDF 转 Excelpython scripts/pdf_to_excel.py <pdf_path> <output_path>提取表格到 Excel
合并 PDFpython scripts/merge_pdfs.py <output_path> <file1> <file2> ...合并多个 PDF
拆分 PDFpython scripts/split_pdf.py <pdf_path> <output_dir>按页拆分 PDF
添加水印python scripts/add_watermark.py <pdf_path> <output_path> <text>添加文字水印
OCR 识别python scripts/ocr_pdf.py <pdf_path> <output_path>OCR 识别扫描件
加密 PDFpython scripts/encrypt_pdf.py <input> <output> <password>AES-256 加密
解密 PDFpython scripts/decrypt_pdf.py <input> <output> <password>解密 PDF
压缩 PDFpython scripts/compress_pdf.py <input> <output>压缩 PDF 文件
批量处理python scripts/batch_process.py <input_dir> <output_dir> --operation <op>批量处理

功能详情

extract_text.py

提取 PDF 文本内容,支持:

  • 纯文本提取
  • 保留段落结构
  • 提取元数据(标题、作者、创建时间)
python scripts/extract_text.py input.pdf -o output.txt --metadata

extract_tables.py

提取 PDF 表格数据:

  • 自动检测表格边框
  • 支持合并单元格
  • 输出为 Excel 文件

pdf_to_word.py

PDF 转 Word 转换:

  • 保留原始格式
  • 提取图片到 Word
  • 表格转换为 Word 表格

pdf_to_excel.py

PDF 转 Excel:

  • 提取表格到不同 Sheet
  • 保留文本内容

add_watermark.py

水印功能:

  • 支持文字水印
  • 可设置透明度、旋转角度、字体大小
  • 支持批量添加

ocr_pdf.py

OCR 识别(需要安装 Tesseract):

  • 使用 Tesseract 进行中文识别
  • 支持多种语言混合识别
  • 保留原有 PDF 格式

encrypt_pdf.py / decrypt_pdf.py

加密解密:

  • AES-256 加密
  • 支持用户密码和所有者密码

compress_pdf.py

压缩功能:

  • 清理未使用对象
  • 压缩图片
  • 5 个压缩级别可选

batch_process.py

批量处理:

  • 支持所有单文件操作
  • 自动处理目录中所有 PDF
  • 生成处理报告

使用示例

从 PDF 提取文本

用户: 帮我提取这个合同的文本内容
AI: 使用 extract_text.py 脚本提取文本

PDF 转 Word

用户: 把这个 PDF 转成 Word 文档
AI: 使用 pdf_to_word.py 进行转换

批量加水印

用户: 给这个文件夹里所有 PDF 添加"内部资料"水印
AI: 使用 batch_process.py 批量处理

加密 PDF

用户: 这个文件需要加密
AI: 使用 encrypt_pdf.py 进行 AES-256 加密

依赖安装

基础依赖

pip install pymupdf pdfplumber python-docx openpyxl pillow

OCR 支持(可选)

# 安装 Tesseract OCR
# Windows: https://github.com/UB-Mannheim/tesseract/wiki
# macOS: brew install tesseract
# Linux: sudo apt install tesseract-ocr

pip install pytesseract

注意事项

  • 加密 PDF 需要提供密码
  • OCR 需要安装 Tesseract 引擎
  • 大文件处理可能需要较长时间
  • 转换效果取决于 PDF 原始质量

Comments

Loading comments...