Install
openclaw skills install pdf-intelligence-suitePDF智能处理套件 - 文本提取、表格识别、OCR、PDF转Word/Excel等 | PDF Intelligence Suite - Text extraction, table recognition, OCR, PDF to Word/Excel conversion
openclaw skills install pdf-intelligence-suitePDF智能处理套件是一个功能强大的PDF文档处理工具集,提供文本提取、表格识别、OCR文字识别、格式转换等一站式服务。
pdf-intelligence-suite/
├── SKILL.md # 本文件
├── README.md # 使用文档
├── requirements.txt # 依赖声明
├── setup.py # 安装配置
├── src/
│ └── pdf_intelligence_suite/
│ ├── __init__.py
│ ├── extractor.py # 文本提取模块
│ ├── tables.py # 表格识别模块
│ ├── ocr.py # OCR识别模块
│ ├── converter.py # 格式转换模块
│ ├── manipulator.py # 页面操作模块
│ ├── security.py # 安全处理模块
│ └── utils.py # 工具函数
├── examples/
│ └── basic_usage.py # 使用示例
└── tests/
└── test_pdf_suite.py # 单元测试
from pdf_intelligence_suite import PDFExtractor, TableExtractor, OCRProcessor
# 文本提取
extractor = PDFExtractor()
text = extractor.extract_text("document.pdf")
# 表格提取
tables = TableExtractor.extract_tables("report.pdf", output_format="excel")
# OCR识别
ocr = OCRProcessor(lang='chi_sim+eng')
text = ocr.process("scanned.pdf")
pip install -r requirements.txt
# 安装Tesseract OCR引擎(Ubuntu/Debian)
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra
# macOS
brew install tesseract tesseract-lang
# Windows: 下载安装包 https://github.com/UB-Mannheim/tesseract/wiki
PDF Intelligence Suite is a powerful PDF document processing toolkit providing one-stop services for text extraction, table recognition, OCR, format conversion, and more.
from pdf_intelligence_suite import PDFExtractor, TableExtractor, OCRProcessor
# Text extraction
extractor = PDFExtractor()
text = extractor.extract_text("document.pdf")
# Table extraction
tables = TableExtractor.extract_tables("report.pdf", output_format="excel")
# OCR recognition
ocr = OCRProcessor(lang='eng')
text = ocr.process("scanned.pdf")
pip install -r requirements.txt
# Install Tesseract OCR engine (Ubuntu/Debian)
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
MIT License
ClawHub Skills Collection