image2text

v1.0.0

Extract text from images using tesseract OCR, supporting local files, URLs, and base64 inputs for text-only AI models without vision capability.

⭐ 0· 82·0 current·0 all-time

by@caiming0331

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for caiming0331/image2text.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "image2text" (caiming0331/image2text) from ClawHub.
Skill page: https://clawhub.ai/caiming0331/image2text
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install image2text

ClawHub CLI

Package manager switcher

npx clawhub@latest install image2text

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name, description, SKILL.md, and the included script all describe the same functionality: take a local path/URL/base64 input, download or decode it to a temp file, run local tesseract, and return extracted text. Required capabilities (tesseract binary) are consistent with the purpose; no unrelated env vars or credentials are requested.

ℹ

Instruction Scope

Runtime instructions and the script stay within OCR scope: they accept local/URL/base64 inputs, download or decode to temp files, run tesseract, and output text. The script will download arbitrary URLs supplied by the user (urllib or curl) and invokes subprocesses (curl, tesseract). These behaviors are expected for a URL-capable OCR tool but mean the agent will fetch remote data you provide — avoid passing untrusted URLs or base64 content.

✓

Install Mechanism

There is no install specification; the skill is instruction-only and ships a small Python script. The only external dependency is the system tesseract binary (SKILL.md suggests brew install on mac). No downloaded archives or non-standard installers are used.

✓

Credentials

The skill requires no environment variables, credentials, or config paths. It only uses system binaries (curl if urllib fails, and tesseract) and temporary files; requested permissions are proportional to its stated function.

✓

Persistence & Privilege

always is false and the skill does not attempt to modify other skills, global agent config, or persist credentials. It writes temporary files during execution and deletes them in the finally block.

Assessment

This skill appears to do exactly what it says: local OCR via your system tesseract. Before installing/using it: (1) ensure tesseract and any language packs you need are installed locally; (2) do not pass untrusted URLs or pasted base64 from unknown sources (the script will download and process whatever URL you supply); (3) be aware the script calls subprocesses (curl as a fallback and tesseract) and writes temporary files which it deletes; and (4) no credentials are requested, and results are printed locally (no external transmission coded into the skill). If you need automatic fetching from arbitrary web locations in a sensitive environment, consider restricting allowed sources or reviewing network policies first.

Like a lobster shell, security has layers — review code before you run it.

latestvk979kvp1wp2azy692ks6na0fsd85d8dy

82downloads

0stars

1versions

Updated 5d ago

v1.0.0

MIT-0

image2text

Extract text from images without needing a vision-capable AI model.

Usage

python3 scripts/ocr.py <image path|URL|base64> [--lang <languages>] [--psm <mode>] [--raw]

Parameters

--lang: Language codes, comma-separated, default chi_sim+eng
- chi_sim Simplified Chinese | chi_tra Traditional | eng English | jpn Japanese | kor Korean | and 30+ more
- Combine: chi_sim+eng
--psm: Page segmentation mode, default 6
- 3 Fully automatic | 6 Block-level | 4 Single line | 11 Sparse text
--raw: Output plain text only, no markers

Auto-Detects Input Type

Local path: /Users/xxx/Downloads/xxx.png
Web URL: https://example.com/image.png — OSS temp links work too
Base64: Pasted image data from clipboard — just paste directly

Workflow

Receive image input → auto-detect type (local path / URL / base64)
URL → curl downloads to temp file
Base64 → decode to temp file
Run tesseract OCR
Output plain text

Examples

OCR a Chinese receipt:

python3 scripts/ocr.py ~/Downloads/receipt.png --lang chi_sim

English + Chinese mixed:

python3 scripts/ocr.py https://example.com/doc.jpg --lang chi_sim+eng

Plain text only (no markers):

python3 scripts/ocr.py /path/to/image.png --raw

Requirements

tesseract must be installed: brew install tesseract
Language packs auto-installed with tesseract
On Mac: binary at /opt/homebrew/bin/tesseract
Temp files auto-deleted after execution
For best accuracy on receipts/screenshots: try --psm 3

Comments

Loading comments...