Local GLM OCR with llama.cpp on AIPC(no API Key)

Image OCR, text recognition, extract text from image, scan document, read image text, invoice OCR, receipt OCR, contract recognition, table extraction, business card OCR, ID recognition, screenshot text extraction, document digitization. Runs locally on Windows using the GLM-OCR model, supports mixed Chinese/English text, prioritizes Intel iGPU inference, no cloud API calls.

Audits

Pass

ClawScanReview

Agentic behavior and permission review.

Static analysisPass

Pattern checks against bundled files.

VirusTotalPass

Multi-engine malware detections and file reputation.

Install

openclaw skills install local-image-ocr-aipc

Image OCR — Local AI PC (Windows · GLM-OCR · llama.cpp Vulkan)

Model: ggml-org/GLM-OCR-GGUF (Q8_0, HuggingFace / hf-mirror)
Inference: llama-server (llama.cpp Vulkan prebuilt, HTTP API)
SKILL_VERSION: 1.0.0

Directory Structure (auto-created or user-specified)

<OCR_DIR>\                        ← auto-selected drive or user-specified (e.g. C:\image-ocr or D:\image-ocr)
├── llama.cpp\                    ← llama-server.exe and related binaries
└── models\
    └── GLM-OCR-GGUF\
        ├── GLM-OCR-Q8_0.gguf        ← main model (~950 MB)
        └── mmproj-GLM-OCR-Q8_0.gguf ← vision projection layer (~484 MB, required)

Python helper scripts (located alongside this SKILL.md):

File	Purpose
`_ocr_cfg.py`	Shared helper: resolves OCR_DIR from env var or `.ocr_dir` config file
`preflight_workdir.py`	Locate / create OCR working directory (writes `.ocr_dir` config)
`preflight_llama.py`	Check llama.cpp version
`preflight_models.py`	Check model files
`install_llama.py`	Download and extract llama.cpp Vulkan binary
`check_disk.py`	Check available disk space
`download_models_hf.py`	Download models via HuggingFace Hub
`download_models_modelscope.py`	Download models via ModelScope
`verify_models.py`	Verify downloaded model file sizes
`ocr_run.py`	Run GLM-OCR inference on an image (via llama-server HTTP API)

⚠️ Before You Install — Security & Compliance Disclosure

This skill will perform the following actions on your system. Review carefully before granting autonomous execution.

Action Source Risk level
Download and extract llama-server.exe and related binaries github.com/ggml-org/llama.cpp releases Medium — executes downloaded binary
Download model files (~1.5 GB total) HuggingFace (huggingface.co) or ModelScope (modelscope.cn) Low — large file download
Automatically install Miniforge (if Python not found) github.com/conda-forge/miniforge Medium — silently installs Python to %USERPROFILE%\miniforge3
Create directories and write files to disk Local filesystem only Low

Credentials: The GLM-OCR model (ggml-org/GLM-OCR-GGUF) is publicly available and does not require authentication. If you use a private or gated HuggingFace model, you may need to set $env:HUGGINGFACE_TOKEN before running the download commands — this skill does not request or store any tokens automatically.

Trust: Verify you trust the following sources before proceeding:

https://github.com/ggml-org/llama.cpp/releases

https://huggingface.co/ggml-org/GLM-OCR-GGUF

https://github.com/conda-forge/miniforge (only if Miniforge auto-install is triggered)

If you prefer to control each step manually rather than granting autonomous execution, run the PowerShell commands in SKILL.md one by one yourself.

Dependencies: Model files are downloaded via Python's huggingface_hub (hf download) or modelscope. If Python is not installed, Step 2 will automatically install Miniforge (no admin rights required).

Action	Source	Risk level
Download and extract `llama-server.exe` and related binaries	`github.com/ggml-org/llama.cpp` releases	Medium — executes downloaded binary
Download model files (~1.5 GB total)	HuggingFace (`huggingface.co`) or ModelScope (`modelscope.cn`)	Low — large file download
Automatically install Miniforge (if Python not found)	`github.com/conda-forge/miniforge`	Medium — silently installs Python to `%USERPROFILE%\miniforge3`
Create directories and write files to disk	Local filesystem only	Low

⚠️ AI Assistant Instructions

Execute one command at a time; wait for output before proceeding.
Stop immediately on error; refer to the Troubleshooting table at the end.
Wrap all paths in double quotes.
<OCR_DIR> is the absolute working directory path, determined after Pre-flight.
Single goal: Recognize image content and return text results.

Execution flow (do not skip steps):

Pre-flight: Check working dir + llama.cpp + models            → STATUS values
Step 1:     Check Python (PYTHON_OK/MISSING)                  → PYTHON_OK
            Install / update llama.cpp (only if MISSING)      → LLAMA_OK
Step 2:     Download models (only if MISSING)                 → MODEL_OK
Step 3:     Run inference via Python → llama-server.exe HTTP API → Return result

Progress reporting: Announce each step before starting, e.g.: 🔍 Pre-flight: Checking environment…

⚠️ Using Bash.exe on Windows (Git Bash / WSL)

If the AI assistant's terminal is bash.exe rather than PowerShell or CMD, environment variables set via $env:OCR_DIR (PowerShell) or set OCR_DIR= (CMD) will not be visible to Python scripts launched from bash. This was the most common cause of "cannot find llama.cpp / models" errors.

How it's solved now

All scripts use _ocr_cfg.py which resolves OCR_DIR via two methods (in order):

Environment variable OCR_DIR — works if properly export-ed in bash
.ocr_dir config file — written automatically by preflight_workdir.py alongside the scripts

As long as preflight_workdir.py has been run once (from any shell), all subsequent scripts will find the OCR directory automatically, regardless of which shell is used. No manual environment variable setup is needed.

If using bash.exe, just ensure:

# Run preflight from the script directory — this creates .ocr_dir config file
cd /path/to/local-image-ocr-aipc
python preflight_workdir.py       # writes .ocr_dir, all other scripts will use it
python preflight_llama.py         # works without setting OCR_DIR manually
python ocr_run.py "C:\path\to\image.png"

Summary

Shell	Needs `export OCR_DIR=...`?	Works after `preflight_workdir.py`?
PowerShell / CMD	No (auto-resolved)	✅ Yes
Git Bash	No (reads `.ocr_dir` file)	✅ Yes
WSL bash	No (reads `.ocr_dir` file)	✅ Yes

Note: If running from WSL, subprocess.Popen does not support creationflags. The scripts handle this automatically (_CREATE_NO_WINDOW is 0 when sys.platform != "win32").

Pre-flight: Check Environment

🔍 Pre-flight: Checking working directory, llama.cpp, and model files…

Locate Working Directory

See preflight_workdir.py.

Run with: python preflight_workdir.py

Success criteria: Output contains a line with OCR_DIR=. This also writes a .ocr_dir config file so all subsequent scripts can find the directory automatically (no environment variable needed). Record the path and substitute <OCR_DIR> in subsequent steps.

Check llama.cpp

See preflight_llama.py.

Run with: python preflight_llama.py

Check Model Files

See preflight_models.py.

Run with: python preflight_models.py

Output	Action
Both `READY`	✅ Skip to Step 3
`LLAMA_STATUS=MISSING/OUTDATED`	⬇️ Execute Step 1
`MODEL_STATUS=MISSING`	⬇️ Execute Step 2

Announce: ✅ Environment check complete. Execute steps as needed.

Step 1: Check Python + Install / Update llama.cpp Vulkan

🐍 Step 1a: Checking Python availability…

Check Python

Note: This step requires an existing Python to run scripts. Use the snippet below to locate Python, record the path as PYTHON_EXE, and use that Python for all subsequent steps.

Option A — PowerShell (Windows default)

# Minimal PowerShell to locate Python; all subsequent steps run via Python
$customPythonExe = ""  # Optional: specify a path manually, e.g. "C:\Python311\python.exe"
$found = $null
if ($customPythonExe -and (Test-Path $customPythonExe)) { $found = $customPythonExe }
if (-not $found) {
    foreach ($cmd in @("python","python3","py")) {
        if (Get-Command $cmd -ErrorAction SilentlyContinue) { $found = (Get-Command $cmd).Source; break }
    }
}
if (-not $found) {
    foreach ($p in @(
        "$env:USERPROFILE\miniforge3\python.exe","$env:USERPROFILE\miniconda3\python.exe",
        "$env:USERPROFILE\anaconda3\python.exe")) {
        if (Test-Path $p) { $found = $p; break }
    }
}
if ($found) { $env:PYTHON_EXE = $found; Write-Host "PYTHON_STATUS=OK"; Write-Host "PYTHON_EXE=$found" }
else { Write-Host "PYTHON_STATUS=MISSING" }

Option B — Bash / Git Bash / WSL (fallback if PowerShell is unavailable)

# Locate Python via bash; set PYTHON_EXE for subsequent steps
CUSTOM_PYTHON_EXE=""  # Optional: e.g. "/c/Python311/python.exe"

found=""
if [ -n "$CUSTOM_PYTHON_EXE" ] && [ -x "$CUSTOM_PYTHON_EXE" ]; then
    found="$CUSTOM_PYTHON_EXE"
fi
if [ -z "$found" ]; then
    for cmd in python python3 py; do
        if command -v "$cmd" >/dev/null 2>&1; then
            found=$(command -v "$cmd")
            break
        fi
    done
fi
if [ -z "$found" ]; then
    for p in \
        "$USERPROFILE/miniforge3/python.exe" \
        "$USERPROFILE/miniconda3/python.exe" \
        "$USERPROFILE/anaconda3/python.exe" \
        "$HOME/miniforge3/bin/python" \
        "$HOME/miniconda3/bin/python" \
        "$HOME/anaconda3/bin/python"; do
        if [ -x "$p" ]; then found="$p"; break; fi
    done
fi
if [ -n "$found" ]; then
    export PYTHON_EXE="$found"
    echo "PYTHON_STATUS=OK"
    echo "PYTHON_EXE=$found"
else
    echo "PYTHON_STATUS=MISSING"
fi

If Python is not found (PYTHON_STATUS=MISSING), install Miniforge:

Consent required: Miniforge will be silently installed to %USERPROFILE%\miniforge3. This installs a Python runtime and conda/pip toolchain. No admin rights are needed. Source: github.com/conda-forge/miniforge. Confirm with the user before proceeding.

Option A — PowerShell

# Download and silently install Miniforge using PowerShell (cannot bootstrap with Python — PowerShell required here)
$mf = "$env:TEMP\Miniforge3-Windows-x86_64.exe"
Invoke-WebRequest -Uri "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Windows-x86_64.exe" -OutFile $mf
Start-Process $mf -ArgumentList "/S /D=$env:USERPROFILE\miniforge3" -Wait
Remove-Item $mf
$env:PYTHON_EXE = "$env:USERPROFILE\miniforge3\python.exe"
& $env:PYTHON_EXE --version
Write-Host "PYTHON_STATUS=OK"

Option B — Bash / Git Bash / WSL (fallback if PowerShell is unavailable)

# Download and silently install Miniforge using curl + bash
MF_URL="https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
# On Windows/Git Bash use the Windows installer instead:
# MF_URL="https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Windows-x86_64.exe"

MF_INSTALLER="/tmp/Miniforge3-install.sh"
curl -fsSL "$MF_URL" -o "$MF_INSTALLER"
bash "$MF_INSTALLER" -b -p "$HOME/miniforge3"
rm -f "$MF_INSTALLER"
export PYTHON_EXE="$HOME/miniforge3/bin/python"
"$PYTHON_EXE" --version
echo "PYTHON_STATUS=OK"

Output	Action
`PYTHON_STATUS=OK`	✅ Continue to Step 1b (install llama.cpp if needed)
`PYTHON_STATUS=MISSING` → Miniforge installed	✅ Continue to Step 1b
Miniforge download error	⛔ Check network, or install Python manually

Announce: ✅ Python ready. Checking llama.cpp…

⬇️ Step 1b: Installing / updating llama.cpp Vulkan… (only when LLAMA_STATUS=MISSING/OUTDATED)

Consent required: Before proceeding, inform the user:

A ZIP (~50–100 MB) will be downloaded from github.com/ggml-org/llama.cpp/releases

It will be extracted to <OCR_DIR>\llama.cpp\ and the original ZIP will be deleted

llama-server.exe will be placed on disk and launched as a local HTTP server by ocr_run.py

Ask the user to confirm before running the download command.

See install_llama.py. Edit the TAG variable at the top if a newer release is available.

Run with: python install_llama.py

Output	Action
`LLAMA_INSTALL=DONE`	✅ Continue to Step 2 to download models
Download error	⛔ Check network, or manually download from browser and extract to `<OCR_DIR>\llama.cpp\`

Announce: ✅ llama.cpp installed. Continue to Step 2 to download models.

Step 2: Download GLM-OCR Models

📦 Step 2: Downloading GLM-OCR models… (only when MODEL_STATUS=MISSING)

Note: Models are downloaded via Python's hf download (huggingface_hub) or modelscope. Python must already be available (PYTHON_STATUS=OK from Step 1). $env:PYTHON_EXE will be used.

First-time Download Notice (required reading when MODEL_STATUS=MISSING)

Announce the following to the user, then ask whether to proceed:

📥 First-time model download is approximately 1.5 GB
   (GLM-OCR-Q8_0.gguf ~950 MB + mmproj ~484 MB).
   Estimated download time:
   • 100 Mbps connection: ~2 minutes
   •  50 Mbps connection: ~4 minutes
   •  10 Mbps connection: ~20 minutes

   Downloads support resumption — if interrupted, re-running this step
   will automatically continue from where it left off.

   ✅ Ready — start automatic download
   📂 I prefer to download manually — skip automatic download

User chooses automatic download → continue with download commands below
User chooses manual download → jump to the "Manual Download Fallback" section at the end of this step

Check Disk Space

See check_disk.py.

Run with: python check_disk.py

Output	Action
`DISK_STATUS=OK`	✅ Continue to Download Models
`DISK_STATUS=LOW`	⚠️ Ask user to free space before continuing

Download Models

Option A: huggingface_hub Python API (recommended)

See download_models_hf.py. To use the HF mirror (China), uncomment the HF_ENDPOINT line inside the file.

Run with: python download_models_hf.py

Option B: ModelScope (alternative for users in China)

See download_models_modelscope.py.

Run with: python download_models_modelscope.py

Verify:

See verify_models.py.

Run with: python verify_models.py

Output	Action
`MODEL_DOWNLOAD=DONE`	✅ Continue to Step 3
Timeout / repeated failure	⚠️ Direct user to "Manual Download Fallback" section, or switch between Option A / B and retry

Announce: ✅ Model download complete.

Manual Download Fallback

If automatic download repeatedly fails, guide the user to download manually and place files in the correct directory:

⚠️ Automatic download failed. Please manually download the following two files:

1. GLM-OCR-Q8_0.gguf (~950 MB)
   HuggingFace: https://huggingface.co/ggml-org/GLM-OCR-GGUF/resolve/main/GLM-OCR-Q8_0.gguf
   HF Mirror:   https://hf-mirror.com/ggml-org/GLM-OCR-GGUF/resolve/main/GLM-OCR-Q8_0.gguf
   ModelScope:  https://modelscope.cn/models/ggml-org/GLM-OCR-GGUF/resolve/master/GLM-OCR-Q8_0.gguf

2. mmproj-GLM-OCR-Q8_0.gguf (~484 MB)
   HuggingFace: https://huggingface.co/ggml-org/GLM-OCR-GGUF/resolve/main/mmproj-GLM-OCR-Q8_0.gguf
   HF Mirror:   https://hf-mirror.com/ggml-org/GLM-OCR-GGUF/resolve/main/mmproj-GLM-OCR-Q8_0.gguf
   ModelScope:  https://modelscope.cn/models/ggml-org/GLM-OCR-GGUF/resolve/master/mmproj-GLM-OCR-Q8_0.gguf

Once downloaded, place both files into:
   <OCR_DIR>\models\GLM-OCR-GGUF\

Then re-run the Verify command to confirm the files are intact before continuing to Step 3.

Step 3: Process Recognition Result

🔍 Step 3: Processing GLM-OCR recognition result…

Determine Input Source

Situation	Action
User message contains a local file path (e.g. `C:\Users\...\xxx.png`)	⬇️ Case A: extract path from message, write & run Python script
User uploaded an image via the interface; OpenClaw provides a temp path	⬇️ Case B: retrieve temp path from context, write & run Python script
Neither	⛔ Ask user to provide a local file path or upload an image

Case A: User Provides a Local File Path

Step 1 — Set the image path: Open ocr_run.py and set IMG_PATH to the file path extracted from the user's message. Alternatively, pass the path as a command-line argument.

Step 2 — Run inference:

python ocr_run.py "<file path extracted from user message>"

Success criteria: stdout contains the recognized text content.

Case B: User Uploaded an Image via the Interface

OpenClaw saves uploaded images to a temporary path. Retrieve that path from context.

Step 1 — Set the image path: Open ocr_run.py and set IMG_PATH to the temporary path provided by OpenClaw. Alternatively, pass the path as a command-line argument.

Step 2 — Run inference:

python ocr_run.py "<temporary image path provided by OpenClaw>"

Success criteria: stdout contains the recognized text content.

Format Output

Once the recognized text is obtained, process it according to the user's intent:

Scenario	Handling
General text extraction	Output the recognized text as-is, preserving original layout
Invoice / receipt	Extract structured fields from the text; output as JSON + human-readable format
Table	Reformat the recognized text as a Markdown table
Business card	Extract name, title, company, phone, email, address; output as JSON
ID / certificate	Output structured by original layout
Screenshot / document	Organize output by paragraph
User-defined	Process according to the user's stated requirements

Completion announcement:

✅ Recognition complete!
Let me know if you'd like to re-process, change the output format, or export to a file.

Situation	Handling
`ERROR: File not found`	File path does not exist — ask user to verify the path
Empty / garbled output	Low image quality — ask user to retake or rescan
Blurry / low-resolution image	Ask user to retake or zoom in before retrying
No text detected	Inform user that no recognizable text was found in the image

Troubleshooting

Error	Cause	Solution
`PYTHON_STATUS=MISSING`	Python not installed	Step 1 auto-installs Miniforge; confirm consent and retry
`llama-server` not found	llama-server.exe path not set correctly	Verify `<OCR_DIR>\llama.cpp\llama-server.exe` exists
Server failed to start	Wrong binary version, Vulkan error, or port conflict	Re-run Step 1 to reinstall latest llama.cpp; check Vulkan drivers
`ggml_vulkan: no devices found`	Vulkan driver not installed	Update GPU driver
`error: unable to open model`	Incorrect model path	Re-run Pre-flight model check to verify path
`MODEL_DOWNLOAD=` no output	Download interrupted	Switch between Option A / B, or configure proxy
Garbled / blank output	Low image quality	Improve image quality
VRAM insufficient / crash	Not enough GPU memory	Lower `-ngl` value, or use `--device none`

References

llama.cpp Releases: https://github.com/ggml-org/llama.cpp/releases
GLM-OCR GGUF: https://huggingface.co/ggml-org/GLM-OCR-GGUF