GLM-OCR-SDK

v1.0.4

Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析...

⭐ 1· 416·0 current·0 all-time

byJared Wen@jaredforreal

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for jaredforreal/glmocr-sdk.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "GLM-OCR-SDK" (jaredforreal/glmocr-sdk) from ClawHub.
Skill page: https://clawhub.ai/jaredforreal/glmocr-sdk
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: ZHIPU_API_KEY
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install glmocr-sdk

ClawHub CLI

Package manager switcher

npx clawhub@latest install glmocr-sdk

Security Scan

Capability signals

Requires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description, required primaryEnv (ZHIPU_API_KEY), and the SDK/CLI usage in SKILL.md are consistent: this is an OCR SDK that calls a MaaS API. Requiring the ZHIPU_API_KEY is proportionate to the stated purpose.

Instruction Scope

SKILL.md instructs the agent to pip install glmocr, set/export ZHIPU_API_KEY, invoke CLI with --api-key or --env-file, and allows constructor/CLI overrides (api_url, model, timeout). Two issues increase risk: (1) the api_url constructor parameter lets calls be redirected to an arbitrary endpoint (potential exfiltration) and (2) the CLI supports loading an arbitrary .env file path or passing the API key on the command line (which can expose secrets in process lists or logs). These behaviors expand the surface beyond a simple 'call Zhipu' flow and are not limited by the metadata.

ℹ

Install Mechanism

The skill package is instruction-only (no install spec), but SKILL.md tells users/agents to run `pip install glmocr`. That means an external PyPI package will be installed at runtime; the package code is not included here and was not scanned. Installing uninspected packages is a moderate risk—acceptable for many cases but worth auditing the package/release before use.

ℹ

Credentials

Only ZHIPU_API_KEY is declared as required, which matches the service. SKILL.md also documents optional GLMOCR_* env vars (timeouts, logging) that weren't explicitly declared—this is minor. However the CLI/constructor options (api_key inline, --env-file, api_url override) allow the agent to read arbitrary env files or submit data to non-standard endpoints; these capabilities increase the potential for accidental or malicious credential/data exposure.

✓

Persistence & Privilege

always:false; instruction-only skill with no install-time persistence or system-wide config modification requested. The skill does not request elevated persistent privileges.

What to consider before installing

This skill appears to do what it says (OCR via Zhipu) and legitimately needs a ZHIPU_API_KEY. Before installing or using it, consider the following: 1) Do not pass API keys inline on the CLI (process lists/logs can leak them); prefer environment variables set in a controlled place. 2) Avoid pointing --env-file to system or user-wide files that contain unrelated secrets. 3) Verify the glmocr package source (PyPI package and GitHub repo or release tag) before pip installing; review its code or pinned release hash if you handle sensitive documents. 4) Confirm the default MaaS endpoint and avoid overriding api_url unless you trust the target—api_url override could be used to send your documents + API key to an arbitrary server. 5) Review Zhipu's data retention/privacy policy before uploading confidential documents. If you want higher assurance, provide the glmocr package code or the exact pip package version/sha so it can be audited; without that, exercise caution when processing sensitive data.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📄 Clawdis

EnvZHIPU_API_KEY

Primary envZHIPU_API_KEY

latestvk97cnqd5vyavawvb13ynvysjx184wm50

416downloads

1stars

5versions

Updated 22h ago

v1.0.4

MIT-0

OpenClaw Skill: glmocr

Parses documents (images, PDFs, scans) via the GLM-OCR SDK.

📌 On-demand: This skill requires only ZHIPU_API_KEY in the environment. No YAML config files or GPU needed.

⚡ Quick Start

# Install
pip install glmocr

# Set API key (once)
export ZHIPU_API_KEY=sk-xxx
# or add to .env file in working directory:
echo "ZHIPU_API_KEY=sk-xxx" >> .env

# One-liner
import glmocr
result = glmocr.parse("document.pdf")
print(result.markdown_result)
print(result.to_dict())

# CLI — pass API key directly (no env setup needed)
glmocr parse image.png --api-key sk-xxx

# Or load from a specific .env file
glmocr parse image.png --env-file /path/to/.env

# Or rely on env var / auto-discovered .env (set once, then omit)
glmocr parse image.png
glmocr parse ./scans/ --output ./output/ --stdout

Configuration Priority

Constructor kwargs  >  os.environ  >  .env file  >  config.yaml  >  built-in defaults

Agents override everything via constructor kwargs or env vars — no YAML editing needed.

Key Environment Variables

Variable	Description	Example
`ZHIPU_API_KEY`	API key (required for MaaS)	`sk-abc123`
`GLMOCR_MODEL`	Model name	`glm-ocr`
`GLMOCR_TIMEOUT`	Request timeout (seconds)	`600`
`GLMOCR_ENABLE_LAYOUT`	Layout detection on/off	`true`
`GLMOCR_LOG_LEVEL`	`DEBUG` / `INFO` / `WARNING` / `ERROR`	`INFO`

Python API

Convenience function (single call)

import glmocr

# Single file → PipelineResult
result = glmocr.parse("invoice.png")

# Multiple files → list[PipelineResult]
results = glmocr.parse(["page1.png", "page2.png", "report.pdf"])

Class-based (multiple calls / resource reuse)

from glmocr import GlmOcr

parser = GlmOcr(api_key="sk-xxx")   # mode auto-set to "maas"
parser = GlmOcr(mode="maas")        # reads ZHIPU_API_KEY from env

# Always use as context manager or call .close()
with GlmOcr(api_key="sk-xxx") as parser:
    result = parser.parse("document.png")
    print(result.markdown_result)

parser.close()   # if not using `with`

Constructor Parameters

Parameter	Type	Description
`api_key`	`str`	API key. Providing this auto-enables MaaS mode.
`api_url`	`str`	Override MaaS endpoint URL
`model`	`str`	Model name override
`timeout`	`int`	Request timeout in seconds (default: 600)
`enable_layout`	`bool`	Enable layout detection
`log_level`	`str`	Logging level

Working with `PipelineResult`

Fields

result.markdown_result    # str — full document as Markdown
result.json_result        # list[list[dict]] — structured regions per page
result.original_images    # list[str] — absolute paths of input images

`json_result` structure

List of pages → list of regions per page:

[
  [
    {
      "index": 0,
      "label": "title",
      "content": "Annual Report 2024",
      "bbox_2d": [100, 50, 900, 120]
    },
    {
      "index": 1,
      "label": "table",
      "content": "| Q1 | Q2 |\n|---|---|\n| 120 | 145 |",
      "bbox_2d": [100, 140, 900, 400]
    }
  ]
]

Bounding boxes (bbox_2d): [x1, y1, x2, y2] normalised to 0–1000 scale.

Region labels: title, text, table, figure, formula, header, footer, page_number, reference, seal

Serialization

# Dict (JSON-serializable, for passing to other tools)
d = result.to_dict()
# Keys: json_result, markdown_result, original_images, usage (MaaS), data_info (MaaS)

# JSON string
json_str = result.to_json()                 # pretty-printed, ensure_ascii=False
json_str = result.to_json(indent=None)      # compact single line

# Save to disk: writes <stem>/<stem>.json + <stem>/<stem>.md + layout_vis/
result.save(output_dir="./output")
result.save(output_dir="./output", save_layout_visualization=False)

Error Handling

The SDK does not raise on MaaS errors — check to_dict() for an "error" key:

result = parser.parse("image.png")
d = result.to_dict()
if "error" in d:
    # Handle failure
    print("OCR failed:", d["error"])
else:
    print(d["markdown_result"])

CLI Reference

Agent-preferred interface: use the CLI for most operations. Set ZHIPU_API_KEY in env once, then invoke as needed.

Supported input formats: .jpg, .jpeg, .png, .bmp, .gif, .webp, .pdf

Basic usage

# Parse a single file → saves to ./output/<stem>/
# MaaS mode is the default; ZHIPU_API_KEY must be set (or use --api-key)
glmocr parse image.png

# Pass API key directly without any env setup
glmocr parse image.png --api-key sk-xxx

# Parse a directory → saves each file to ./output/<stem>/
glmocr parse ./scans/

# Use self-hosted vLLM/SGLang instead of cloud
glmocr parse image.png --mode selfhosted

# Specify output directory
glmocr parse image.png --output ./results/

Read results in the terminal (agent-friendly)

# Print Markdown + JSON to stdout (and still save to disk)
glmocr parse image.png --stdout

# Print to stdout ONLY — do not write any files
glmocr parse image.png --stdout --no-save

# JSON only (no Markdown output)
glmocr parse image.png --stdout --json-only

# Pipe JSON into jq for structured extraction
glmocr parse image.png --stdout --json-only --no-save | jq '.[0] | map(select(.label=="table"))'

Save control

# Skip layout visualization images (faster, smaller output)
glmocr parse image.png --no-layout-vis

# Parse and save only JSON + Markdown, skip layout vis
glmocr parse image.png --no-layout-vis --output ./results/

Batch processing

# All images in a folder
glmocr parse ./invoice_scans/ --output ./parsed/ --no-layout-vis

# With progress visible in logs
glmocr parse ./docs/ --output ./parsed/ --log-level INFO

Debugging

glmocr parse image.png --log-level DEBUG

Full flag reference

Flag	Default	Description
`--api-key / -k`	env var	API key for MaaS mode (overrides `ZHIPU_API_KEY`)
`--mode`	`maas`	`maas` (cloud, default) or `selfhosted` (local GPU)
`--env-file`	auto	Path to `.env` file (default: auto-discover from cwd)
`--output / -o`	`./output`	Output directory
`--stdout`	off	Print JSON + Markdown to stdout
`--no-save`	off	Skip writing files (use with `--stdout`)
`--json-only`	off	stdout JSON only, no Markdown
`--no-layout-vis`	off	Skip layout visualization images
`--config / -c`	none	Path to YAML config override
`--log-level`	`INFO`	`DEBUG` / `INFO` / `WARNING` / `ERROR`

Typical Agent Workflow

receive document path / URL
       │
       ▼
glmocr.parse(path)            ← single call, handles PDF/image
       │
       ▼
result.to_dict()              ← safe to pass as tool output
       │
       ├── markdown_result    → hand to LLM for reading / summarization
       └── json_result        → structured extraction (tables, formulas, regions by label)

Filter by label

result = glmocr.parse("report.png")
regions = result.json_result[0]  # first page

tables = [r for r in regions if r["label"] == "table"]
formulas = [r for r in regions if r["label"] == "formula"]
body_text = [r for r in regions if r["label"] == "text"]

Multi-page PDF → iterate pages

with GlmOcr(api_key="sk-xxx") as parser:
    result = parser.parse("document.pdf")   # all pages in one PipelineResult
    for page_idx, page_regions in enumerate(result.json_result):
        print(f"Page {page_idx + 1}: {len(page_regions)} regions")
        for region in page_regions:
            print(f"  [{region['label']}] {region['content'][:60]}")

Programmatic config (no env vars)

from glmocr.config import GlmOcrConfig

cfg = GlmOcrConfig.from_env(
    api_key="sk-xxx",
    mode="maas",
    timeout=600,
    log_level="DEBUG",
)

Output Directory Layout

After result.save(output_dir):

output_dir/
  <image_stem>/
    <image_stem>.json         ← structured regions
    <image_stem>.md           ← full Markdown (with cropped figure images)
    imgs/                     ← cropped figures referenced in Markdown
    layout_vis/               ← layout detection overlay images (if enabled)
      <image_stem>.jpg

Common Pitfalls

ZHIPU_API_KEY not set: SDK defaults to MaaS mode. Without a key, parse() will fail with a clear error message and quick-fix instructions. Set via export ZHIPU_API_KEY=sk-xxx, add to a .env file, or pass --api-key sk-xxx to the CLI.
Large PDFs: Default timeout is 600s. For very long documents increase with timeout=1200.
result.json_result is a string: Happens when the model returns malformed JSON. The SDK preserves the raw string — parse or log it manually.

Comments

Loading comments...