Install
openclaw skills install ocr-locally[macOS only] Use this skill when the user requests OCR (Optical Character Recognition), image/PDF text extraction. Uses macOS native Vision/PDFKit frameworks. Triggers: '识别图片', 'OCR', '提取图片文字', '提取PDF文字', '识别PDF', 'extract text from image', 'PDF OCR'.
openclaw skills install ocr-locally⚠️ Platform Requirement: This skill is macOS only. It requires macOS 10.15+ (Catalina or later) and uses macOS native frameworks:
This skill provides OCR (Optical Character Recognition) capabilities using macOS native Vision framework. It extracts text from images and PDFs without requiring any third-party libraries or internet connection.
⚠️ macOS Only - This skill cannot run on Linux, Windows, or other operating systems.
Required:
Why macOS Only?
Vision framework for OCR (macOS/iOS only)PDFKit framework for PDF processing (macOS/iOS only)AppKit/Core Graphics for image handling (macOS only)Trigger this skill when the user:
Use scripts/ocr_vision_pro.swift for comprehensive OCR with the following features:
-t): Output only extracted text (default)-j): Output complete raw info including text, position, and confidence as JSONBasic usage:
swift scripts/ocr_vision_pro.swift <image_path>
With options:
swift scripts/ocr_vision_pro.swift <image_path> -l zh-Hans,en -o output.txt -f
Use scripts/pdf_ocr.swift to extract text from PDF files with the following features:
1-5, 1,3,5)-t): Output only extracted text (default)-j): Output complete raw info as JSONBasic usage (all pages):
swift scripts/pdf_ocr.swift <pdf_path>
With page specification:
# Single page
swift scripts/pdf_ocr.swift document.pdf -p 1
# Multiple pages
swift scripts/pdf_ocr.swift document.pdf -p 1,3,5
# Page range
swift scripts/pdf_ocr.swift document.pdf -p 1-5
# JSON mode
swift scripts/pdf_ocr.swift document.pdf -p 1 -j
The script supports two output modes that cannot be used simultaneously:
-t)Outputs only the extracted text:
-o or -t): Saves text to file, optionally with separate confidence file-j)Outputs complete raw information as JSON:
JSON output structure:
{
"imagePath": "/path/to/image.png",
"totalBlocks": 25,
"averageConfidence": 0.85,
"blocks": [
{
"index": 1,
"text": "recognized text",
"confidence": 0.95,
"boundingBox": {
"x": 0.10,
"y": 0.20,
"width": 0.30,
"height": 0.05
}
}
]
}
Image Formats (ocr_vision_pro.swift):
PDF Format (pdf_ocr.swift):
-p option: 1, 1,3,5, or 1-5| Option | Description |
|---|---|
-h, --help | Show help information |
-t, --text | Text mode (default, output only extracted text) |
-j, --json | JSON mode (output complete raw info as JSON) |
-l, --language <lang> | Specify recognition language (comma-separated) |
-o, --output <file> | Output text to file, auto-generate confidence file (<file>_confidence.txt) |
-t, --text <file> | Output only complete text to specified file (text mode) |
-c, --confidence <file> | Output only confidence details to specified file (text mode) |
-f, --fast | Use fast mode (default: precise mode) |
Note: -t (text mode) and -j (JSON mode) are mutually exclusive. JSON mode outputs to stdout only.
Supported languages:
zh-Hans - Simplified Chinesezh-Hant - Traditional Chineseen - Englishja - Japaneseko - Koreanfr - Frenchde - Germanes - Spanishit - Italianpt - Portugueseru - RussianWhen the user requests OCR:
Based on user request or context:
zh-Hans,en for Chinese users, or en for English users-f) for quick preview-o option)Run the OCR script with appropriate parameters:
swift scripts/ocr_vision_pro.swift "<image_path>" -l zh-Hans,en
For saving to separate files (recommended):
swift scripts/ocr_vision_pro.swift "<image_path>" -o "<output>"
This automatically creates two files:
<output>.txt - Complete extracted text (pure text, no formatting)<output>_confidence.txt - Confidence details with statistics and per-block infoFor separate text and confidence files with custom names:
swift scripts/ocr_vision_pro.swift "<image_path>" -t "text.txt" -c "confidence.txt"
After OCR completes:
When processing a PDF file:
Identify PDF path and pages:
1, 1,3,5, or 1-5Determine recognition parameters:
zh-hans,zh-hant,en-f)-j)Execute PDF OCR:
# All pages
swift scripts/pdf_ocr.swift "<pdf_path>"
# Specific pages
swift scripts/pdf_ocr.swift "<pdf_path>" -p 1,3,5
# Page range
swift scripts/pdf_ocr.swift "<pdf_path>" -p 1-5
# JSON mode
swift scripts/pdf_ocr.swift "<pdf_path>" -p 1 -j
The script supports two mutually exclusive output modes:
-t)Outputs only the extracted text.
-o or -t <file>):[Extracted text content]
-o option):Creates <output>.txt with pure text.
-j):=== 置信度详情 ===
总识别块数: 25
平均置信度: 0.85
--- 逐块详情 ---
[1] Text content
置信度: 0.95
位置: x=0.10, y=0.20, w=0.30, h=0.05
--- 低置信度警告 (< 0.8) ---
[3] "Some text" - 置信度: 0.50
-o option):<output>.txt - Complete extracted text<output>_confidence.txt - Confidence details-j)Outputs complete raw information as JSON to stdout (no file output in JSON mode).
JSON Structure:
{
"imagePath": "/path/to/image.png",
"totalBlocks": 25,
"averageConfidence": 0.85,
"blocks": [
{
"index": 1,
"text": "recognized text",
"confidence": 0.95,
"boundingBox": {
"x": 0.10,
"y": 0.20,
"width": 0.30,
"height": 0.05
}
}
]
}
Fields:
imagePath: Path to the processed imagetotalBlocks: Total number of recognized text blocksaverageConfidence: Average confidence score (0.0 - 1.0)blocks: Array of recognized text blocks
index: Block index (1-based)text: Recognized text contentconfidence: Confidence score (0.0 - 1.0)boundingBox: Normalized bounding box coordinates (0.0 - 1.0)for img in *.png; do
swift scripts/ocr_vision_pro.swift "$img" -o "${img%.png}.txt"
done
For detailed usage instructions and examples, load references/usage.md.
ocr_vision_pro.swift - Enhanced OCR script for images with full feature supportocr_vision.swift - Basic OCR script for simple use casespdf_ocr.swift - PDF OCR script for extracting text from PDF filesusage.md - Comprehensive usage guide with examples