Install
openclaw skills install tesseract-image-ocrExtract text from images using Tesseract.js (OCR). Supports multi-language recognition including Chinese and English, region recognition, character whitelist filtering, text orientation detection, and can run in a Node.js environment.
openclaw skills install tesseract-image-ocrExtract text content from images based on Tesseract.js (the WebAssembly port of the Tesseract OCR engine).
Use when users need "image to text," "OCR recognition," "extract text from images," "screenshot character recognition," "scan to text," or "image text orientation detection."
['eng', 'chi_sim'])--rectangle), character whitelists (--whitelist)--detect)--psm) and OCR engine modes (--oem)text (default), hocr, blocks (JSON), tsvnode -v && npm ls tesseract.js 2>/dev/null || echo "tesseract.js not installed"
If not installed:
cd /root/.openclaw/workspace/skills/tesseract-ocr && npm init -y > /dev/null 2>&1 && npm install tesseract.js
Basic command:
node scripts/ocr.js <image-path-or-url> [--options]
Parameter Descriptions:
| Parameter | Type | Default | Description |
|---|---|---|---|
<image> | Required | — | Local path or HTTPS URL |
--lang | string | eng | Language code(s), multiple joined with +, e.g., eng+chi_sim |
--psm | number | — | Page segmentation mode (see PSM table below) |
--oem | number | — | OCR engine mode (see OEM table below) |
--whitelist | string | — | Character whitelist, e.g., 0123456789 to recognize only digits |
--rectangle | string | — | Recognition region, format top,left,width,height |
--output | string | text | Output format: text / hocr / blocks / tsv |
--detect | flag | — | Detect text orientation and script (does not perform OCR) |
--dpi | number | — | Manually specify image DPI |
Common Examples:
# Basic mixed Chinese-English recognition
node scripts/ocr.js photo.jpg --lang chi_sim+eng
# Recognize digits only (license plates, CAPTCHAs, etc.)
node scripts/ocr.js captcha.png --whitelist 0123456789
# Column-based recognition (suitable for vertical Chinese text)
node scripts/ocr.js scroll.jpg --lang chi_sim --psm 4
# Specify a region for recognition
node scripts/ocr.js receipt.png --rectangle 50,100,400,200
# Detect image text orientation
node scripts/ocr.js rotated.jpg --detect
# Output structured data
node scripts/ocr.js doc.png --output blocks
# Manually specify DPI (avoids "Invalid resolution 0 dpi" warning)
node scripts/ocr.js scan.png --dpi 300
To reuse a Worker, the AI should write an inline script:
const { createWorker } = require('tesseract.js');
(async () => {
const worker = await createWorker('eng');
for (const img of ['a.png', 'b.png', 'c.png']) {
const { data: { text } } = await worker.recognize(img);
console.log(img, '→', text);
}
await worker.terminate();
})();
The --psm parameter controls how Tesseract analyzes page layout:
| PSM | Name | Description |
|---|---|---|
| 0 | OSD_ONLY | Orientation and script detection only |
| 1 | AUTO_OSD | Automatic page segmentation + orientation detection |
| 2 | AUTO_ONLY | Automatic page segmentation, no orientation detection |
| 3 | AUTO | Fully automatic page segmentation (default) |
| 4 | SINGLE_COLUMN | Single column of variable size text |
| 5 | SINGLE_BLOCK_VERT_TEXT | Single block of vertical text |
| 6 | SINGLE_BLOCK | Single block of text |
| 7 | SINGLE_LINE | Single line of text |
| 8 | SINGLE_WORD | Single word |
| 9 | CIRCLE_WORD | Single word in a circular arrangement |
| 10 | SINGLE_CHAR | Single character |
| 11 | SPARSE_TEXT | Sparse text (find as much as possible) |
| 12 | SPARSE_TEXT_OSD | Sparse text + orientation detection |
| 13 | RAW_LINE | Raw line (treated as a single line) |
| OEM | Description |
|---|---|
| 0 | Legacy engine |
| 1 | LSTM neural network engine (default) |
| 2 | Legacy + LSTM |
| 3 | Default (automatically selected based on current configuration) |
| Language | Code |
|---|---|
| English | eng |
| Simplified Chinese | chi_sim |
| Traditional Chinese | chi_tra |
| Japanese | jpn |
| Korean | kor |
| French | fra |
| German | deu |
| Spanish | spa |
| Russian | rus |
| Arabic | ara |
| Hindi | hin |
Full list: tesseract_lang_list.md
The following scenarios require the AI to write inline scripts directly rather than using scripts/ocr.js:
worker.reinitialize(langs, oem)worker.setParameters({ tessedit_pageseg_mode: ... })worker.detect(image) after createWorker('eng', 0, { legacyCore: true, legacyLang: true })createScheduler() + multiple WorkersFor complete API reference, see references/api.md.