image-to-pdf

将多张图片自动旋转合并为单个PDF，支持根据Excel清单重命名及扫描PDF的OCR文字提取。

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 1 · 186 · 2 current installs · 2 all-time installs

by@meowlegemy-sudo

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The skill's stated purpose (merge images, OCR, rename from Excel, deliver results) aligns with the code snippets, but several capabilities that would be required in practice are not declared: e.g., sending outputs to Telegram is shown but no Telegram bot token/API credentials or network target are required by the registry metadata. The SKILL.md references concrete paths (~/.openclaw/media/inbound/...) and variables (inbound_dir, output_dir, 凭证号, 用户ID) that are not specified in the skill manifest.

Instruction Scope

SKILL.md instructs the agent to read local files (Excel, inbound directory), convert PDFs with pdftocairo, run tesseract OCR, rename/copy files, and automatically send results via a message(...) call. Those actions are within the skill's functional scope, but the instructions access user filesystem locations and perform network-delivery steps without describing required config or consent flow. The skill also assumes specific filename conventions and variable bindings that are ambiguous.

ℹ

Install Mechanism

No install spec is present in the registry (instruction-only), which is low-risk. The SKILL.md recommends installing Python packages and system binaries (Pillow, pypdf, openpyxl, poppler, tesseract), which is expected for the functionality. The absence of a formal install spec means these steps would be manual; it's not inherently dangerous but the skill should declare required binaries/config in metadata.

Credentials

Registry lists no required env vars or config paths, yet the instructions require access to local directories and a messaging/send API. Sending to Telegram (message(action="send", ... target="用户ID")) implies a bot token or service credential that is not requested. That mismatch is disproportionate and raises the risk of ad-hoc credential handling or hardcoding. Filename/Excel path access (~/.openclaw/media/inbound/...) also should be declared as required config.

ℹ

Persistence & Privilege

always:false (normal). The skill's instructions describe automatic sending of results and waiting/delay logic; combined with autonomous invocation (disable-model-invocation:false) this could send files without an explicit runtime confirmation unless the agent's governance enforces prompts. This is not an inherent privilege escalation, but it increases potential impact if credentials or delivery targets are misconfigured.

What to consider before installing

Before installing or enabling: (1) require the skill author to declare and document the exact configuration: inbound_dir, output_dir, how 凭证号 is determined, and the Excel path pattern; (2) insist the skill metadata list any required credentials (e.g., TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID or equivalent) instead of leaving messaging implied; (3) verify where message(...) is implemented and ensure tokens are never hardcoded or uploaded to public repos; (4) confirm the agent will ask user confirmation before automatically sending files; (5) review and sanitize filename handling to avoid path traversal and special-character issues; (6) if you will run the suggested install commands, install them yourself from official sources (pip, Homebrew) and run in a controlled environment; (7) if the author cannot clarify the missing env/config declarations, treat the skill as unsafe to use with sensitive documents.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk974q28yfg3jwnzp53wz6kd88h82d6vq

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Image to PDF Merger Skill

将多张图片合并成一个 PDF 文档，适合处理会计凭证、扫描文件等。支持自动旋转调整方向。

依赖

# Python
pip3 install Pillow pypdf openpyxl

# 系统工具
brew install poppler tesseract tesseract-lang

触发条件

用户说"合并 PDF"、"把图片合成 PDF"、"生成 PDF"
用户发送多张图片并要求合并
用户发送 PDF 并要求添加摘要
用户发送扫描版 PDF 并要求"提取文字"、"总结"

使用方法

1. 收集图片/PDF

让用户一次性发送所有需要处理的 PDF，不要分开多次发。

2. 执行合并（图片转 PDF）

适用于图片文件（.jpg）：

import os
from pypdf import PdfWriter
from PIL import Image

# 获取所有新上传的图片
images = sorted(glob.glob(f"{inbound_dir}*.jpg"), key=os.path.getmtime, reverse=True)[:20]

merger = PdfWriter()
for img_path in images:
    img = Image.open(img_path)
    # 自动旋转：横版图片旋转90度
    width, height = img.size
    if width > height:
        img = img.rotate(90, expand=True)
    pdf_path = img_path.replace('.jpg', '.pdf')
    img.save(pdf_path, 'PDF', resolution=100.0)
    merger.append(pdf_path)

output_path = f"{output_dir}{凭证号}.pdf"
merger.write(output_path)

3. PDF 重命名（带完整摘要 + 船名归属）

根据 Excel 清单自动重命名 PDF：

Excel 结构：

F列 = Index（凭证序号）
H列 = 摘要（Description）
M列 = 船名/归属

关键：使用字母索引法，避免列号混淆！

import openpyxl
import re
import shutil
import os

# Excel 路径
excel_path = "~/.openclaw/media/inbound/Vouching_list---*.xlsx"

# 加载 Excel（使用字母索引）
wb = openpyxl.load_workbook(excel_path)
ws = wb.active

# 需要处理的文件列表 (序号, 原始文件名)
files = [
    (1, "1._1月79号---xxx.pdf"),
    (2, "2_1月109号---xxx.pdf"),
    # ...
]

for idx, filename in files:
    # Index N -> Excel Row N+1（因为 Row 1 是表头）
    excel_row = idx + 1
    
    # 使用字母索引直接访问列
    desc = ws[f'H{excel_row}'].value    # H列 = 摘要
    ship = ws[f'M{excel_row}'].value    # M列 = 船名归属
    
    # 从文件名提取凭证号（如 "1月79号"）
    date_match = re.search(r'(\d+月\d+号)', filename)
    date_part = date_match.group(1) if date_match else ""
    
    # 命名格式：序号_凭证号-【船名】摘要.pdf
    # 例如：01_1月79号-【祥瀚8】V2501运费收入.pdf
    new_name = f"{idx:02d}_{date_part}-【{ship}】{desc}.pdf"
    
    # 复制并重命名
    src = os.path.join(inbound_dir, filename)
    dst = os.path.join(output_dir, new_name)
    shutil.copy(src, dst)

文件名格式规范：

序号_凭证号-【船名/公司】摘要.pdf

示例：
01_1月79号-【祥瀚8】V2501运费收入.pdf
02_1月109号-【River】坞修2411001.pdf
03_1月135号-【Alice】2503航次收入.pdf
04_1月169号-【Anna】DBS融资款付船款.pdf

4. 发送到 Telegram

message(
    action="send",
    filePath=output_path,
    message=f"{序号}. {凭证号} - 【{船名}】{摘要}",
    target="用户ID"
)

关键要点

一次性收集 - 让用户一次发完所有文件
使用字母索引 - 用 ws[f'H{row}'] 而非 row[7]
正确映射 - Index N → Excel Row N+1
完整命名 - 序号_凭证号-【船名】摘要
自动发送 - 处理完成后自动推送给用户

错误处理

如果 Excel 找不到对应 Index，跳过或用原文件名
如果 PDF 是图片合并的（无文字层），使用 OCR 功能提取文字
文件名中的特殊字符（如 /）需要清理

功能三：扫描版 PDF OCR 文字提取

当用户发送扫描版 PDF（无文字层）时，使用 tesseract 提取文字：

步骤 1：PDF 转图片

# 安装依赖
brew install poppler tesseract tesseract-lang

# PDF 转图片（300 DPI）
pdftocairo -png -r 300 input.pdf page

步骤 2：OCR 识别

# 识别中文+英文
tesseract page-01.png stdout -l chi_sim+eng > output.txt

# 批量识别
for p in page-*.png; do
  tesseract "$p" stdout -l chi_sim+eng >> combined.txt
done

步骤 3：提取关键信息

import re

# 读取 OCR 结果
with open('combined.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# 提取关键信息（根据文档类型调整）
patterns = {
    '贷款人': r'([A-Z]+ BANK LTD\.?.*?)\n',
    '借款人': r'([A-Z]+ SHIPPING LIMITED.*?)\n',
    '贷款金额': r'USD([\d,]+)',
    '利率': r'([\d.]+)%',
    '期限': r'(\d+)\s*years?',
}

results = {}
for key, pattern in patterns.items():
    match = re.search(pattern, text, re.IGNORECASE)
    if match:
        results[key] = match.group(1) if match.lastindex else match.group(0)

print(results)

完整流程脚本

#!/bin/bash
# pdf-ocr.sh - 从扫描版 PDF 提取文字

PDF_PATH="$1"
OUTPUT_DIR="/tmp/pdf_ocr_$(date +%s)"

mkdir -p "$OUTPUT_DIR"

# 1. PDF 转图片
echo "Converting PDF to images..."
pdftocairo -png -r 300 "$PDF_PATH" "$OUTPUT_DIR/page"

# 2. OCR 识别
echo "Running OCR..."
> "$OUTPUT_DIR/combined.txt"
for img in "$OUTPUT_DIR"/page-*.png; do
    echo "Processing: $img"
    tesseract "$img" stdout -l chi_sim+eng 2>/dev/null >> "$OUTPUT_DIR/combined.txt"
done

# 3. 提取关键信息
echo "Extracting key information..."
python3 << 'EOF'
import re

with open('$OUTPUT_DIR/combined.txt', 'r', encoding='utf-8', errors='ignore') as f:
    text = f.read()

patterns = {
    'lender': r'([A-Z]+ BANK LTD\.?.*?)\n',
    'borrower': r'([A-Z]+ SHIPPING LIMITED.*?)\n',
    'amount': r'USD([\d,]+)',
    'date': r'(\d{1,2}\s+\w+\s+\d{4})',
}

for key, pattern in patterns.items():
    match = re.search(pattern, text, re.IGNORECASE)
    if match:
        print(f"{key}: {match.group(1)}")
EOF

echo "Output saved to: $OUTPUT_DIR/combined.txt"

改进：处理延时

为避免图片传输不完整，添加等待机制：

import time

# 等待所有图片传输完成
time.sleep(5)  # 等待5秒

# 获取最新上传的图片（按修改时间排序）
# 这样可以确保所有已传输的图片都被处理

为什么需要延时

用户一次性发送多张图片
Telegram/网络传输需要时间
如果立刻处理，后传输的图片可能还没到

延时建议

5秒：适合少量图片（1-10张）
10秒：适合大量图片（10张以上）

替代方案

如果不想用固定延时，也可以：

等待用户确认"还有吗？"
用心跳时检查inbound目录是否有新文件

Files

2 total

Select a file

Select a file to preview.

Comments

Loading comments…