Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

PDF All-in-One

v1.0.2

All-in-one PDF processing tool. Merge, split, extract, convert PDFs. Supports text extraction, table recognition, PDF-to-image conversion, OCR. Triggers: PDF...

0· 216·0 current·0 all-time
bywurang@sonicrang

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for sonicrang/pdf-all-in-one.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "PDF All-in-One" (sonicrang/pdf-all-in-one) from ClawHub.
Skill page: https://clawhub.ai/sonicrang/pdf-all-in-one
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install pdf-all-in-one

ClawHub CLI

Package manager switcher

npx clawhub@latest install pdf-all-in-one
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name/description (PDF processing, merge/split/convert/ocr/fill) matches the included Python scripts and docs: extract_form_structure, converters, form-fillers, bounding-box checks, etc. The reference docs list a broad set of libraries (pypdf, pdfplumber, pdf2image, pytesseract, pypdfium2, reportlab) which are appropriate for the stated purpose. However, the registry metadata declares no required binaries or system dependencies even though SKILL.md and scripts instruct the user to install system tools (poppler/pdftoppm/pdftotext, ImageMagick 'magick', qpdf/pdftk, tesseract) — that mismatch is an engineering/provenance inconsistency worth noting.
Instruction Scope
SKILL.md and the scripts operate on local PDF files and local output directories (workspace: <current_workspace>/pdf-all-in-one-workspace/). The instructions reference only local file operations, CLI tools, and Python libraries. The runtime instructions do not direct data to external network endpoints or request unrelated system files or credentials. They do instruct installing and using system utilities (poppler, ImageMagick) and editing/creating PDFs and images — expected for this functionality.
Install Mechanism
There is no install spec (instruction-only), which is low-risk from an automatic install perspective. The package does include multiple Python scripts bundled with the skill; running them requires installing third-party Python packages and system utilities manually. The instructions point to installing via pip and OS package managers (apt/yum/brew) and using ImageMagick; those are common but not declared in registry metadata.
Credentials
The skill declares no required environment variables, no credentials, and no config paths. The scripts do not read environment variables or network credentials. They operate solely on user-supplied PDF files and JSON form descriptions, so there is no disproportionate secret access requested.
Persistence & Privilege
The skill is not marked always:true and does not request persistent system-wide privileges. It is user-invocable and allows normal autonomous invocation (platform default). The scripts write output files into the declared workspace and do not modify other skills or global agent configuration.
What to consider before installing
This skill's code and docs are consistent with a PDF processing tool and do not attempt network exfiltration or require secrets — that is a good sign. Before using it: 1) Verify the skill's source/author (homepage is missing and LICENSE claims Anthropic while the registry owner differs) — if provenance matters, treat this as untrusted until you confirm origin. 2) Install and run in an isolated environment (container or disposable VM) because the scripts will execute locally and may require sudo to install system tools (poppler, ImageMagick, tesseract, qpdf/pdftk). 3) Review the bundled scripts (they are included) yourself; pay attention to the monkeypatch in fill_fillable_fields.py (it patches a pypdf internal method — unusual but local). 4) Only run the scripts on non-sensitive PDFs or copies until you’re comfortable with behavior and output. If you need higher assurance, ask the publisher for a canonical homepage/repository or a signed release.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📕 Clawdis
latestvk978v0hkpc6m63dkn2h2ja28kx833w6n
216downloads
0stars
1versions
Updated 2h ago
v1.0.2
MIT-0

PDF All-in-One Processing Guide

Overview

This guide covers comprehensive PDF processing operations including conversion to images. For advanced features, see REFERENCE.md.

Workspace Directory: <current_workspace>/pdf-all-in-one-workspace/

Quick Start

from pypdf import PdfReader, PdfWriter

# Read a PDF
reader = PdfReader("document.pdf")
print(f"Pages: {len(reader.pages)}")

# Extract text
text = ""
for page in reader.pages:
    text += page.extract_text()

PDF to Image Conversion

Convert PDF Pages to PNG/JPG

from pdf2image import convert_from_path
import os

# Configuration
pdf_path = "input.pdf"
output_dir = "pdf-all-in-one-workspace/output_images"
os.makedirs(output_dir, exist_ok=True)

# Convert PDF to images
images = convert_from_path(pdf_path, dpi=150)

# Save each page as PNG
for i, image in enumerate(images):
    output_path = f"{output_dir}/page_{i+1}.png"
    image.save(output_path, "PNG")
    print(f"Saved: {output_path}")

print(f"Total pages converted: {len(images)}")

Convert with Specific Page Range

from pdf2image import convert_from_path

# Convert only pages 1-5
images = convert_from_path("document.pdf", 
                          dpi=200, 
                          first_page=1, 
                          last_page=5)

for i, image in enumerate(images):
    image.save(f"pdf-all-in-one-workspace/page_{i+1}.jpg", "JPEG", quality=95)

Prerequisites

# Install Python library
pip install pdf2image

# Install system dependency (poppler)
# Ubuntu/Debian:
sudo apt-get install poppler-utils

# CentOS/RHEL:
sudo yum install poppler-utils

# macOS:
brew install poppler

Python Libraries

pypdf - Basic Operations

Merge PDFs

from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
    reader = PdfReader(pdf_file)
    for page in reader.pages:
        writer.add_page(page)

with open("merged.pdf", "wb") as output:
    writer.write(output)

Split PDF

reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
    writer = PdfWriter()
    writer.add_page(page)
    with open(f"pdf-all-in-one-workspace/page_{i+1}.pdf", "wb") as output:
        writer.write(output)

Extract Metadata

reader = PdfReader("document.pdf")
meta = reader.metadata
print(f"Title: {meta.title}")
print(f"Author: {meta.author}")
print(f"Subject: {meta.subject}")
print(f"Creator: {meta.creator}")

Rotate Pages

reader = PdfReader("input.pdf")
writer = PdfWriter()

page = reader.pages[0]
page.rotate(90)  # Rotate 90 degrees clockwise
writer.add_page(page)

with open("rotated.pdf", "wb") as output:
    writer.write(output)

pdfplumber - Text and Table Extraction

Extract Text with Layout

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        print(text)

Extract Tables

with pdfplumber.open("document.pdf") as pdf:
    for i, page in enumerate(pdf.pages):
        tables = page.extract_tables()
        for j, table in enumerate(tables):
            print(f"Table {j+1} on page {i+1}:")
            for row in table:
                print(row)

Advanced Table Extraction

import pandas as pd

with pdfplumber.open("document.pdf") as pdf:
    all_tables = []
    for page in pdf.pages:
        tables = page.extract_tables()
        for table in tables:
            if table:
                df = pd.DataFrame(table[1:], columns=table[0])
                all_tables.append(df)

if all_tables:
    combined_df = pd.concat(all_tables, ignore_index=True)
    combined_df.to_excel("pdf-all-in-one-workspace/extracted_tables.xlsx", index=False)

reportlab - Create PDFs

Basic PDF Creation

from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

c = canvas.Canvas("pdf-all-in-one-workspace/hello.pdf", pagesize=letter)
width, height = letter

c.drawString(100, height - 100, "Hello World!")
c.drawString(100, height - 120, "This is a PDF created with reportlab")
c.line(100, height - 140, 400, height - 140)
c.save()

Subscripts and Superscripts

from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet

styles = getSampleStyleSheet()
chemical = Paragraph("H<sub>2</sub>O", styles['Normal'])
squared = Paragraph("x<super>2</super> + y<super>2</super>", styles['Normal'])

Command-Line Tools

pdftotext (poppler-utils)

# Extract text
pdftotext input.pdf output.txt

# Extract text preserving layout
pdftotext -layout input.pdf output.txt

# Extract specific pages
pdftotext -f 1 -l 5 input.pdf output.txt

qpdf

# Merge PDFs
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf

# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf

# Rotate pages
qpdf input.pdf output.pdf --rotate=+90:1

# Remove password
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf

pdftk

# Merge
pdftk file1.pdf file2.pdf cat output merged.pdf

# Split
pdftk input.pdf burst

# Rotate
pdftk input.pdf rotate 1east output rotated.pdf

pdfimages - Extract Images from PDF

# Extract all images as JPG
pdfimages -j input.pdf pdf-all-in-one-workspace/output_prefix

Common Tasks

Extract Text from Scanned PDFs (OCR)

import pytesseract
from pdf2image import convert_from_path

images = convert_from_path('scanned.pdf')

text = ""
for i, image in enumerate(images):
    text += f"Page {i+1}:\n"
    text += pytesseract.image_to_string(image)
    text += "\n\n"

print(text)

Add Watermark

from pypdf import PdfReader, PdfWriter

watermark = PdfReader("watermark.pdf").pages[0]
reader = PdfReader("document.pdf")
writer = PdfWriter()

for page in reader.pages:
    page.merge_page(watermark)
    writer.add_page(page)

with open("pdf-all-in-one-workspace/watermarked.pdf", "wb") as output:
    writer.write(output)

Password Protection

from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
writer = PdfWriter()

for page in reader.pages:
    writer.add_page(page)

writer.encrypt("userpassword", "ownerpassword")

with open("pdf-all-in-one-workspace/encrypted.pdf", "wb") as output:
    writer.write(output)

Quick Reference

TaskBest ToolCommand/Code
PDF to Imagepdf2imageconvert_from_path(pdf, dpi=150)
Merge PDFspypdfwriter.add_page(page)
Split PDFspypdfOne page per file
Extract textpdfplumberpage.extract_text()
Extract tablespdfplumberpage.extract_tables()
Create PDFsreportlabCanvas or Platypus
OCR scanned PDFspytesseractpdf2image + pytesseract
Extract imagespdfimagespdfimages -j input.pdf output_prefix
Command line mergeqpdfqpdf --empty --pages ...

Workspace Directory Structure

<current_workspace>/
└── pdf-all-in-one-workspace/
    ├── input/          # Place input PDFs here
    ├── output_images/  # Converted images output
    ├── output_pdfs/    # Generated PDFs output
    └── temp/           # Temporary files

Note: Always use pdf-all-in-one-workspace/ as the working directory for PDF operations to keep files organized.

Next Steps

  • For advanced pypdfium2 usage, see REFERENCE.md
  • For JavaScript libraries (pdf-lib), see REFERENCE.md
  • For PDF form filling, see FORMS.md
  • For troubleshooting guides, see REFERENCE.md

Comments

Loading comments...