image2text

v1.0.0

Extract text from images using tesseract OCR, supporting local files, URLs, and base64 inputs for text-only AI models without vision capability.

0· 82·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for caiming0331/image2text.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "image2text" (caiming0331/image2text) from ClawHub.
Skill page: https://clawhub.ai/caiming0331/image2text
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install image2text

ClawHub CLI

Package manager switcher

npx clawhub@latest install image2text
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name, description, SKILL.md, and the included script all describe the same functionality: take a local path/URL/base64 input, download or decode it to a temp file, run local tesseract, and return extracted text. Required capabilities (tesseract binary) are consistent with the purpose; no unrelated env vars or credentials are requested.
Instruction Scope
Runtime instructions and the script stay within OCR scope: they accept local/URL/base64 inputs, download or decode to temp files, run tesseract, and output text. The script will download arbitrary URLs supplied by the user (urllib or curl) and invokes subprocesses (curl, tesseract). These behaviors are expected for a URL-capable OCR tool but mean the agent will fetch remote data you provide — avoid passing untrusted URLs or base64 content.
Install Mechanism
There is no install specification; the skill is instruction-only and ships a small Python script. The only external dependency is the system tesseract binary (SKILL.md suggests brew install on mac). No downloaded archives or non-standard installers are used.
Credentials
The skill requires no environment variables, credentials, or config paths. It only uses system binaries (curl if urllib fails, and tesseract) and temporary files; requested permissions are proportional to its stated function.
Persistence & Privilege
always is false and the skill does not attempt to modify other skills, global agent config, or persist credentials. It writes temporary files during execution and deletes them in the finally block.
Assessment
This skill appears to do exactly what it says: local OCR via your system tesseract. Before installing/using it: (1) ensure tesseract and any language packs you need are installed locally; (2) do not pass untrusted URLs or pasted base64 from unknown sources (the script will download and process whatever URL you supply); (3) be aware the script calls subprocesses (curl as a fallback and tesseract) and writes temporary files which it deletes; and (4) no credentials are requested, and results are printed locally (no external transmission coded into the skill). If you need automatic fetching from arbitrary web locations in a sensitive environment, consider restricting allowed sources or reviewing network policies first.

Like a lobster shell, security has layers — review code before you run it.

latestvk979kvp1wp2azy692ks6na0fsd85d8dy
82downloads
0stars
1versions
Updated 5d ago
v1.0.0
MIT-0

image2text

Extract text from images without needing a vision-capable AI model.

Usage

python3 scripts/ocr.py <image path|URL|base64> [--lang <languages>] [--psm <mode>] [--raw]

Parameters

  • --lang: Language codes, comma-separated, default chi_sim+eng
    • chi_sim Simplified Chinese | chi_tra Traditional | eng English | jpn Japanese | kor Korean | and 30+ more
    • Combine: chi_sim+eng
  • --psm: Page segmentation mode, default 6
    • 3 Fully automatic | 6 Block-level | 4 Single line | 11 Sparse text
  • --raw: Output plain text only, no markers

Auto-Detects Input Type

  1. Local path: /Users/xxx/Downloads/xxx.png
  2. Web URL: https://example.com/image.png — OSS temp links work too
  3. Base64: Pasted image data from clipboard — just paste directly

Workflow

  1. Receive image input → auto-detect type (local path / URL / base64)
  2. URL → curl downloads to temp file
  3. Base64 → decode to temp file
  4. Run tesseract OCR
  5. Output plain text

Examples

OCR a Chinese receipt:

python3 scripts/ocr.py ~/Downloads/receipt.png --lang chi_sim

English + Chinese mixed:

python3 scripts/ocr.py https://example.com/doc.jpg --lang chi_sim+eng

Plain text only (no markers):

python3 scripts/ocr.py /path/to/image.png --raw

Requirements

  • tesseract must be installed: brew install tesseract
  • Language packs auto-installed with tesseract
  • On Mac: binary at /opt/homebrew/bin/tesseract
  • Temp files auto-deleted after execution
  • For best accuracy on receipts/screenshots: try --psm 3

Comments

Loading comments...