Pdf Intelligence Suite
v1.0.0PDF智能处理套件 - 文本提取、表格识别、OCR、PDF转Word/Excel等 | PDF Intelligence Suite - Text extraction, table recognition, OCR, PDF to Word/Excel conversion
⭐ 0· 121·0 current·0 all-time
byLv Lancer@kaiyuelv
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name, README, SKILL.md, requirements, and the shown source files (extractor, ocr, converter, manipulator, etc.) are coherent: the requested libraries (PyPDF2, pdfplumber, pytesseract, pdf2image, python-docx, openpyxl, reportlab, Pillow, camelot) match the described features (text extraction, table recognition, OCR, conversion, page manipulation, security). No unrelated cloud credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md gives concrete install and usage steps (pip install -r requirements.txt, install system Tesseract/poppler), and the runtime examples and APIs operate only on local PDF files. The instructions do not ask the agent to read unrelated host files, access external endpoints, or exfiltrate environment variables.
Install Mechanism
There is no special install spec (the skill relies on pip requirements and system packages). This is low risk in terms of arbitrary downloads, but the dependency list includes system-level components (Tesseract, poppler) and heavy Python packages (camelot, opencv, pdf2image) that require native libraries; the README documents those needs. Minor inconsistency: setup.py defines a console entry_point 'pdf-suite=pdf_intelligence_suite.cli:main' but no cli.py was listed in the manifest, which may be a packaging oversight (not necessarily malicious).
Credentials
The skill does not declare required environment variables or credentials. The README notes optional TESSDATA_PREFIX for nonstandard Tesseract installs (reasonable). No environment variables named SECRET/TOKEN/KEY are requested and the code shown does not read unrelated env vars.
Persistence & Privilege
The skill does not request always:true and has default invocation privileges. It does not attempt to modify other skills or system-wide agent configuration in the reviewed files.
Assessment
This package looks like a straightforward local PDF processing library and its files (extractor, ocr, converter, manipulator) match the documented features. Before installing or running on sensitive documents: 1) Review the omitted/truncated files (security.py, utils.py, tables.py and any remaining code) for any network calls or unexpected file access—these files were not fully shown. 2) Run the package in an isolated environment (VM or container) because several dependencies require native system packages (Tesseract, poppler, Ghostscript) and heavy Python packages. 3) Note the small packaging inconsistency: setup.py references pdf_intelligence_suite.cli:main but cli.py isn't in the manifest—expect that the console script may not work until fixed. 4) If you will process confidential PDFs, verify security.py (encryption/decryption) behavior and any logging/network functionality to ensure no external transmission. If you want, I can scan the omitted files for network or subprocess usage if you provide them.Like a lobster shell, security has layers — review code before you run it.
latestvk97eev416rv4mhqq528svsfh59839297
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
