Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

GLM-OCR-SDK

v1.0.4

Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析...

1· 416·0 current·0 all-time
byJared Wen@jaredforreal

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for jaredforreal/glmocr-sdk.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "GLM-OCR-SDK" (jaredforreal/glmocr-sdk) from ClawHub.
Skill page: https://clawhub.ai/jaredforreal/glmocr-sdk
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: ZHIPU_API_KEY
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install glmocr-sdk

ClawHub CLI

Package manager switcher

npx clawhub@latest install glmocr-sdk
Security Scan
Capability signals
Requires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description, required primaryEnv (ZHIPU_API_KEY), and the SDK/CLI usage in SKILL.md are consistent: this is an OCR SDK that calls a MaaS API. Requiring the ZHIPU_API_KEY is proportionate to the stated purpose.
!
Instruction Scope
SKILL.md instructs the agent to pip install glmocr, set/export ZHIPU_API_KEY, invoke CLI with --api-key or --env-file, and allows constructor/CLI overrides (api_url, model, timeout). Two issues increase risk: (1) the api_url constructor parameter lets calls be redirected to an arbitrary endpoint (potential exfiltration) and (2) the CLI supports loading an arbitrary .env file path or passing the API key on the command line (which can expose secrets in process lists or logs). These behaviors expand the surface beyond a simple 'call Zhipu' flow and are not limited by the metadata.
Install Mechanism
The skill package is instruction-only (no install spec), but SKILL.md tells users/agents to run `pip install glmocr`. That means an external PyPI package will be installed at runtime; the package code is not included here and was not scanned. Installing uninspected packages is a moderate risk—acceptable for many cases but worth auditing the package/release before use.
Credentials
Only ZHIPU_API_KEY is declared as required, which matches the service. SKILL.md also documents optional GLMOCR_* env vars (timeouts, logging) that weren't explicitly declared—this is minor. However the CLI/constructor options (api_key inline, --env-file, api_url override) allow the agent to read arbitrary env files or submit data to non-standard endpoints; these capabilities increase the potential for accidental or malicious credential/data exposure.
Persistence & Privilege
always:false; instruction-only skill with no install-time persistence or system-wide config modification requested. The skill does not request elevated persistent privileges.
What to consider before installing
This skill appears to do what it says (OCR via Zhipu) and legitimately needs a ZHIPU_API_KEY. Before installing or using it, consider the following: 1) Do not pass API keys inline on the CLI (process lists/logs can leak them); prefer environment variables set in a controlled place. 2) Avoid pointing --env-file to system or user-wide files that contain unrelated secrets. 3) Verify the glmocr package source (PyPI package and GitHub repo or release tag) before pip installing; review its code or pinned release hash if you handle sensitive documents. 4) Confirm the default MaaS endpoint and avoid overriding api_url unless you trust the target—api_url override could be used to send your documents + API key to an arbitrary server. 5) Review Zhipu's data retention/privacy policy before uploading confidential documents. If you want higher assurance, provide the glmocr package code or the exact pip package version/sha so it can be audited; without that, exercise caution when processing sensitive data.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📄 Clawdis
EnvZHIPU_API_KEY
Primary envZHIPU_API_KEY
latestvk97cnqd5vyavawvb13ynvysjx184wm50
416downloads
1stars
5versions
Updated 22h ago
v1.0.4
MIT-0

OpenClaw Skill: glmocr

Parses documents (images, PDFs, scans) via the GLM-OCR SDK.

📌 On-demand: This skill requires only ZHIPU_API_KEY in the environment. No YAML config files or GPU needed.

⚡ Quick Start

# Install
pip install glmocr

# Set API key (once)
export ZHIPU_API_KEY=sk-xxx
# or add to .env file in working directory:
echo "ZHIPU_API_KEY=sk-xxx" >> .env
# One-liner
import glmocr
result = glmocr.parse("document.pdf")
print(result.markdown_result)
print(result.to_dict())
# CLI — pass API key directly (no env setup needed)
glmocr parse image.png --api-key sk-xxx

# Or load from a specific .env file
glmocr parse image.png --env-file /path/to/.env

# Or rely on env var / auto-discovered .env (set once, then omit)
glmocr parse image.png
glmocr parse ./scans/ --output ./output/ --stdout

Configuration Priority

Constructor kwargs  >  os.environ  >  .env file  >  config.yaml  >  built-in defaults

Agents override everything via constructor kwargs or env vars — no YAML editing needed.

Key Environment Variables

VariableDescriptionExample
ZHIPU_API_KEYAPI key (required for MaaS)sk-abc123
GLMOCR_MODELModel nameglm-ocr
GLMOCR_TIMEOUTRequest timeout (seconds)600
GLMOCR_ENABLE_LAYOUTLayout detection on/offtrue
GLMOCR_LOG_LEVELDEBUG / INFO / WARNING / ERRORINFO

Python API

Convenience function (single call)

import glmocr

# Single file → PipelineResult
result = glmocr.parse("invoice.png")

# Multiple files → list[PipelineResult]
results = glmocr.parse(["page1.png", "page2.png", "report.pdf"])

Class-based (multiple calls / resource reuse)

from glmocr import GlmOcr

parser = GlmOcr(api_key="sk-xxx")   # mode auto-set to "maas"
parser = GlmOcr(mode="maas")        # reads ZHIPU_API_KEY from env

# Always use as context manager or call .close()
with GlmOcr(api_key="sk-xxx") as parser:
    result = parser.parse("document.png")
    print(result.markdown_result)

parser.close()   # if not using `with`

Constructor Parameters

ParameterTypeDescription
api_keystrAPI key. Providing this auto-enables MaaS mode.
api_urlstrOverride MaaS endpoint URL
modelstrModel name override
timeoutintRequest timeout in seconds (default: 600)
enable_layoutboolEnable layout detection
log_levelstrLogging level

Working with PipelineResult

Fields

result.markdown_result    # str — full document as Markdown
result.json_result        # list[list[dict]] — structured regions per page
result.original_images    # list[str] — absolute paths of input images

json_result structure

List of pages → list of regions per page:

[
  [
    {
      "index": 0,
      "label": "title",
      "content": "Annual Report 2024",
      "bbox_2d": [100, 50, 900, 120]
    },
    {
      "index": 1,
      "label": "table",
      "content": "| Q1 | Q2 |\n|---|---|\n| 120 | 145 |",
      "bbox_2d": [100, 140, 900, 400]
    }
  ]
]

Bounding boxes (bbox_2d): [x1, y1, x2, y2] normalised to 0–1000 scale.

Region labels: title, text, table, figure, formula, header, footer, page_number, reference, seal

Serialization

# Dict (JSON-serializable, for passing to other tools)
d = result.to_dict()
# Keys: json_result, markdown_result, original_images, usage (MaaS), data_info (MaaS)

# JSON string
json_str = result.to_json()                 # pretty-printed, ensure_ascii=False
json_str = result.to_json(indent=None)      # compact single line

# Save to disk: writes <stem>/<stem>.json + <stem>/<stem>.md + layout_vis/
result.save(output_dir="./output")
result.save(output_dir="./output", save_layout_visualization=False)

Error Handling

The SDK does not raise on MaaS errors — check to_dict() for an "error" key:

result = parser.parse("image.png")
d = result.to_dict()
if "error" in d:
    # Handle failure
    print("OCR failed:", d["error"])
else:
    print(d["markdown_result"])

CLI Reference

Agent-preferred interface: use the CLI for most operations. Set ZHIPU_API_KEY in env once, then invoke as needed.

Supported input formats: .jpg, .jpeg, .png, .bmp, .gif, .webp, .pdf

Basic usage

# Parse a single file → saves to ./output/<stem>/
# MaaS mode is the default; ZHIPU_API_KEY must be set (or use --api-key)
glmocr parse image.png

# Pass API key directly without any env setup
glmocr parse image.png --api-key sk-xxx

# Parse a directory → saves each file to ./output/<stem>/
glmocr parse ./scans/

# Use self-hosted vLLM/SGLang instead of cloud
glmocr parse image.png --mode selfhosted

# Specify output directory
glmocr parse image.png --output ./results/

Read results in the terminal (agent-friendly)

# Print Markdown + JSON to stdout (and still save to disk)
glmocr parse image.png --stdout

# Print to stdout ONLY — do not write any files
glmocr parse image.png --stdout --no-save

# JSON only (no Markdown output)
glmocr parse image.png --stdout --json-only

# Pipe JSON into jq for structured extraction
glmocr parse image.png --stdout --json-only --no-save | jq '.[0] | map(select(.label=="table"))'

Save control

# Skip layout visualization images (faster, smaller output)
glmocr parse image.png --no-layout-vis

# Parse and save only JSON + Markdown, skip layout vis
glmocr parse image.png --no-layout-vis --output ./results/

Batch processing

# All images in a folder
glmocr parse ./invoice_scans/ --output ./parsed/ --no-layout-vis

# With progress visible in logs
glmocr parse ./docs/ --output ./parsed/ --log-level INFO

Debugging

glmocr parse image.png --log-level DEBUG

Full flag reference

FlagDefaultDescription
--api-key / -kenv varAPI key for MaaS mode (overrides ZHIPU_API_KEY)
--modemaasmaas (cloud, default) or selfhosted (local GPU)
--env-fileautoPath to .env file (default: auto-discover from cwd)
--output / -o./outputOutput directory
--stdoutoffPrint JSON + Markdown to stdout
--no-saveoffSkip writing files (use with --stdout)
--json-onlyoffstdout JSON only, no Markdown
--no-layout-visoffSkip layout visualization images
--config / -cnonePath to YAML config override
--log-levelINFODEBUG / INFO / WARNING / ERROR

Typical Agent Workflow

receive document path / URL
       │
       ▼
glmocr.parse(path)            ← single call, handles PDF/image
       │
       ▼
result.to_dict()              ← safe to pass as tool output
       │
       ├── markdown_result    → hand to LLM for reading / summarization
       └── json_result        → structured extraction (tables, formulas, regions by label)

Filter by label

result = glmocr.parse("report.png")
regions = result.json_result[0]  # first page

tables = [r for r in regions if r["label"] == "table"]
formulas = [r for r in regions if r["label"] == "formula"]
body_text = [r for r in regions if r["label"] == "text"]

Multi-page PDF → iterate pages

with GlmOcr(api_key="sk-xxx") as parser:
    result = parser.parse("document.pdf")   # all pages in one PipelineResult
    for page_idx, page_regions in enumerate(result.json_result):
        print(f"Page {page_idx + 1}: {len(page_regions)} regions")
        for region in page_regions:
            print(f"  [{region['label']}] {region['content'][:60]}")

Programmatic config (no env vars)

from glmocr.config import GlmOcrConfig

cfg = GlmOcrConfig.from_env(
    api_key="sk-xxx",
    mode="maas",
    timeout=600,
    log_level="DEBUG",
)

Output Directory Layout

After result.save(output_dir):

output_dir/
  <image_stem>/
    <image_stem>.json         ← structured regions
    <image_stem>.md           ← full Markdown (with cropped figure images)
    imgs/                     ← cropped figures referenced in Markdown
    layout_vis/               ← layout detection overlay images (if enabled)
      <image_stem>.jpg

Common Pitfalls

  • ZHIPU_API_KEY not set: SDK defaults to MaaS mode. Without a key, parse() will fail with a clear error message and quick-fix instructions. Set via export ZHIPU_API_KEY=sk-xxx, add to a .env file, or pass --api-key sk-xxx to the CLI.
  • Large PDFs: Default timeout is 600s. For very long documents increase with timeout=1200.
  • result.json_result is a string: Happens when the model returns malformed JSON. The SDK preserves the raw string — parse or log it manually.

Comments

Loading comments...