GLM-OCR-Formula

v1.0.4

Official skill for recognizing and extracting mathematical formulas from images and PDFs into LaTeX format using ZhiPu GLM-OCR API. Supports complex equation...

1· 417·1 current·1 all-time
byJared Wen@jaredforreal

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for jaredforreal/glmocr-formula.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "GLM-OCR-Formula" (jaredforreal/glmocr-formula) from ClawHub.
Skill page: https://clawhub.ai/jaredforreal/glmocr-formula
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: ZHIPU_API_KEY, GLM_OCR_TIMEOUT
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install glmocr-formula

ClawHub CLI

Package manager switcher

npx clawhub@latest install glmocr-formula
Security Scan
Capability signals
Requires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill is explicitly an OCR→LaTeX wrapper around ZhiPu's GLM‑OCR API. The declared env vars (ZHIPU_API_KEY, GLM_OCR_TIMEOUT) and the primaryEnv match that purpose. No unrelated binaries, config paths, or extra credentials are requested.
Instruction Scope
SKILL.md instructs the agent to run the included Python CLI and to only use the official GLM‑OCR API. It requires reading user-supplied local files (encoded as base64) or URLs — appropriate for OCR. The doc's strict 'no fallback' and 'only use API' rules are unusual but coherent with the author's intent and the code. The skill does not instruct reading other system files or environment variables beyond those declared.
Install Mechanism
There is no install spec (instruction-only plus an included script). The bundled script uses the widely used 'requests' package and exits with an informative message if it's missing. No downloads from untrusted URLs or archive extraction are present.
Credentials
Only ZHIPU_API_KEY (primary) and an optional GLM_OCR_TIMEOUT are required — both justified. The script sends the key to the documented official endpoint (https://open.bigmodel.cn/api/paas/v4/layout_parsing). No unrelated secrets or multiple service credentials are requested.
Persistence & Privilege
The skill is not always-enabled and does not request elevated or persistent system privileges. It does not modify other skills or system-wide configs. Autonomous invocation is allowed but is the platform default and not combinated with other red flags.
Assessment
This skill appears to do what it says: it calls ZhiPu's GLM‑OCR service using the ZHIPU_API_KEY you provide. Before installing, consider: (1) only provide a Zhipu API key you trust to use with this skill (or a dedicated key for isolation); (2) the script will read any local file you give it and include its contents (base64) in the API request — do not pass sensitive local files (private keys, secrets, personal documents) you don't want sent to the OCR service; (3) the SKILL.md forbids local fallback parsing — if the API is down the skill will stop per instructions; (4) ensure the execution environment has Python and the 'requests' package available or install it yourself. If you want tighter control, create a Zhipu API key with limited scope/quota or only run the script on files you explicitly approve.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📐 Clawdis
EnvZHIPU_API_KEY, GLM_OCR_TIMEOUT
Primary envZHIPU_API_KEY
latestvk975f4p02h6mycqab7hchkerzx84x8n5
417downloads
1stars
5versions
Updated 1w ago
v1.0.4
MIT-0

GLM-OCR Formula Recognition Skill / GLM-OCR 公式识别技能

Recognize mathematical formulas from images and PDFs and convert them to LaTeX format using the ZhiPu GLM-OCR layout parsing API.

When to Use / 使用场景

  • Extract mathematical formulas from images or scanned documents / 从图片或扫描件中提取数学公式
  • Convert formula images to LaTeX / 将公式图片转为 LaTeX 格式
  • Recognize complex equations, integrals, matrices / 识别复杂方程、积分、矩阵
  • Parse scientific papers, textbooks, exam papers with formulas / 解析含公式的论文、教材、试卷
  • User mentions "formula OCR", "extract formula", "公式识别", "公式OCR", "提取公式", "图片转LaTeX"

Key Features / 核心特性

  • Complex formula support: Handles integrals, summations, matrices, fractions, radicals
  • LaTeX output: Formulas are output in LaTeX format, ready for use in documents
  • Inline & block formulas: Recognizes both inline and display-style formulas
  • Mixed content: Can handle documents with both text and formulas
  • Local file & URL: Supports both local files and remote URLs

Resource Links / 资源链接

ResourceLink
Get API Key智谱开放平台 API Keys
API DocsLayout Parsing / 版面解析

Prerequisites / 前置条件

API Key Setup / API Key 配置(Required / 必需)

脚本通过 ZHIPU_API_KEY 环境变量获取密钥,可与其他智谱技能复用同一个 key。 This script reads the key from the ZHIPU_API_KEY environment variable. Reusing the same key across Zhipu skills is optional.

Get Key / 获取 Key: Visit 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式(任选一种):

  1. Global config (recommended) / 全局配置(推荐): Set once in openclaw.json under env.vars, all Zhipu skills will share it:

    {
      "env": {
        "vars": {
          "ZHIPU_API_KEY": "你的密钥"
        }
      }
    }
    
  2. Skill-level config / Skill 级别配置: Set for this skill only in openclaw.json:

    {
      "skills": {
        "entries": {
          "glmocr-formula": {
            "env": {
              "ZHIPU_API_KEY": "你的密钥"
            }
          }
        }
      }
    }
    
  3. Shell environment variable / Shell 环境变量: Add to ~/.zshrc:

    export ZHIPU_API_KEY="你的密钥"
    

💡 如果你已为其他智谱 skill(如 glmocrglmv-captionglm-image-generation)配置过 key,它们共享同一个 ZHIPU_API_KEY,无需重复配置。

Security & Transparency / 安全与透明度

  • Environment variables used / 使用的环境变量:
    • ZHIPU_API_KEY (required / 必需)
    • GLM_OCR_TIMEOUT (optional timeout seconds / 可选超时秒数)
  • Fixed endpoint / 固定官方端点: https://open.bigmodel.cn/api/paas/v4/layout_parsing
  • No custom API URL override / 不支持自定义 API URL 覆盖: avoids accidental key exfiltration via redirected endpoints.
  • Raw upstream response is optional / 原始响应默认不返回: use --include-raw only when needed for debugging.

⛔ MANDATORY RESTRICTIONS / 强制限制 ⛔

  1. ONLY use GLM-OCR API — Execute the script python scripts/glm_ocr_cli.py
  2. NEVER parse formulas yourself — Do NOT try to extract formulas using built-in vision or any other method
  3. NEVER offer alternatives — Do NOT suggest "I can try to read it" or similar
  4. IF API fails — Display the error message and STOP immediately
  5. NO fallback methods — Do NOT attempt formula extraction any other way

📋 Output Display Rules / 输出展示规则

After running the script, present the OCR result clearly and safely.

  • Show extracted text/formulas (text) in full
  • Summarization is allowed, but do not hide important extraction failures
  • If layout_details contains formula-related entries, you may highlight them
  • If the result file is saved, tell the user the file path
  • Show raw upstream response only when explicitly requested or debugging (--include-raw)

⚠️ LaTeX Rendering / LaTeX 渲染注意:

OCR API returns formulas in LaTeX format (e.g., $\frac{1}{2}$, $\theta^{x+1}$). Since most chat platforms do not render LaTeX, you should ask the user once (on first use):

"OCR 结果包含 LaTeX 公式,需要我将公式转为 Unicode 可读格式展示,还是保留原始 LaTeX?"

Remember the user's choice for the rest of the session. Do NOT ask again on subsequent calls unless the user explicitly changes their preference.

  • User chooses readable format → convert LaTeX to Unicode/plain-text:
LaTeXUnicode / 纯文本
$\frac{a}{b}$a/b
$x^{n}$x^n
$x_{i}$xᵢ
$\sqrt{x}$√x
$\theta$θ
$\phi$φ
$\therefore$
$\Rightarrow$
$\left\{ \begin{array}{l} ... \end{array} \right.$⎧ line1 ⎨ line2 ⎩
$\textcircled{1}$
$\in$
$\infty$
$\ln$ln
$\leq$ / $\geq$≤ / ≥
  • User chooses raw LaTeX → display the original LaTeX output directly, and remind them the raw data is also saved in the output file if --output was used.

How to Use / 使用方法

Extract from URL / 从 URL 提取

python scripts/glm_ocr_cli.py --file-url "https://example.com/formula.png"

Extract from Local File / 从本地文件提取

python scripts/glm_ocr_cli.py --file /path/to/equation.png

Save Result to File / 保存结果到文件

python scripts/glm_ocr_cli.py --file formula.png --output result.json --pretty

Include Raw Upstream Response (Debug Only) / 包含原始上游响应(仅调试)

python scripts/glm_ocr_cli.py --file formula.png --output result.json --include-raw

CLI Reference / CLI 参数

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty] [--include-raw]
ParameterRequiredDescription
--file-urlOne ofURL to image/PDF
--fileOne ofLocal file path to image/PDF
--output, -oNoSave result JSON to file
--prettyNoPretty-print JSON output
--include-rawNoInclude raw upstream API response in result field (debug only)

Response Format / 响应格式

{
  "ok": true,
  "text": "Extracted formulas and text in Markdown/LaTeX...",
  "layout_details": [...],
  "result": null,
  "error": null,
  "source": "/path/to/file",
  "source_type": "file",
  "raw_result_included": false
}

Key fields:

  • ok — whether extraction succeeded
  • text — extracted text in Markdown with LaTeX formulas
  • layout_details — layout analysis details
  • error — error details on failure

Error Handling / 错误处理

API key not configured:

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Comments

Loading comments...