Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

pdf-ocr-layout

基于智谱 GLM-OCR、GLM-4.7 及 GLM-4.6V 的多模态文档深度解析工具。 Use when: - 需要高精度提取文档(PDF/图片)中的表格并转换为 Markdown 格式 - 需要从文档页面中自动裁剪并提取插图、图表为独立文件 - 需要对提取的图表进行深度语义理解(基于 GLM-4.6V 视觉分析) - 需要对提取的表格数据进行逻辑分析(基于 GLM-4.7 文本分析) 核心架构: 1. 视觉提取:GLM-OCR 2. 语义理解:GLM-4.7 (纯文本/表格) + GLM-4.6V (多模态/图像)

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 1.2k · 2 current installs · 4 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The code and SKILL.md implement a PDF/image layout extraction step plus LLM/VLM analysis against Zhipu models (GLM-OCR, GLM-4.7, GLM-4.6V) — this matches the skill's description. However the registry metadata earlier lists no required environment variables or primary credential, while the SKILL.md and code both require a ZHIPU_API_KEY (inconsistency).
!
Instruction Scope
The runtime instructions and included scripts load arbitrary input files, encode images/base64 and send the file contents and the page's full Markdown context to the Zhipu API for analysis. That behavior is coherent for this tool but means the user's document contents will be transmitted to an external service; the instructions do not document any privacy/retention or opt-out. Also the SKILL.md instructs users to set ZHIPU_API_KEY but the registry metadata did not declare it.
Install Mechanism
There is no install spec (instruction-only), so nothing is auto-downloaded — lower install risk. However the code depends on Python packages and a client library: SKILL.md lists 'zhipuai' as a dependency but the code imports 'zai' (from zai import ZhipuAiClient), a mismatch that will break runtime unless clarified. Required Python libs (pillow, beautifulsoup4) are reasonable for OCR/cropping, but the missing/ambiguous client package is a concern.
!
Credentials
At runtime the scripts require a single credential env var ZHIPU_API_KEY to call the remote API — that is proportionate to the stated function. The problem is the registry metadata lists 'Required env vars: none' and 'Primary credential: none', which is inconsistent and could mislead users about what secrets are needed. No other unrelated credentials are requested.
Persistence & Privilege
The skill does not request permanent/always-on privileges, does not modify other skills, and uses normal file I/O within the provided output directory. There is no 'always: true' or other excessive privilege requested.
What to consider before installing
This package appears to implement the advertised OCR + GLM analysis pipeline, but before installing you should: - Verify the source: there is no homepage or repository listed. Prefer code from a known source if you will send sensitive documents. - Expect document data to be transmitted to Zhipu's API (the scripts Base64-encode images and send full page Markdown/context). Do NOT run on private/sensitive documents unless you're comfortable with that external transmission and the API provider's data retention policy. - Fix/confirm dependencies: SKILL.md lists 'zhipuai' but the code imports 'zai' (from zai import ZhipuAiClient). Confirm the correct client package and install it in a controlled environment (virtualenv/container). - Registry metadata mismatch: the manifest claims no required env vars, but the scripts require ZHIPU_API_KEY. Treat ZHIPU_API_KEY as mandatory and do not place sensitive credentials in shared environments. - If you need higher assurance, ask the publisher for: 1) source repository or release page 2) exact Python package name for the Zhipu client and installation instructions 3) confirmation of what data is sent to the API and the provider's retention/privacy terms Given these inconsistencies and the fact that your documents will be sent to an external API, proceed only after clarifying the above or run the skill in an isolated environment with non-sensitive test files.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.2
Download zip
latestvk975tv22fkp3yw76en9terccb180y0zc

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

GLM-OCR Multimodal Deep Analysis

This tool builds a high-precision document parsing pipeline: using GLM-OCR for layout element extraction, calling GLM-4.7 for logical interpretation of table data, and calling GLM-4.6V for multimodal visual interpretation of images and charts.

Pipeline Implementation Architecture

This Skill consists of two core script stages, orchestrated through glm_ocr_pipeline.py:

1. Extraction Stage (scripts/glm_ocr_extract.py)

  • Core Model: GLM-OCR
  • Function: Responsible for physical layout analysis of documents
  • Output: Extract table HTML and clean to Markdown, automatically crop independent chart image files based on Bbox coordinates, and generate intermediate JSON containing full page reading order

2. Understanding Stage (scripts/glm_understanding.py)

  • Core Model: GLM-4.7 (text) / GLM-4.6V (visual)
  • Function: Responsible for deep semantic reasoning of content
  • Logic:
    • Tables: Combine full text context, use GLM-4.7 to analyze business meaning of Markdown table data
    • Charts: Combine full text context + cropped images, use GLM-4.6V for multimodal visual analysis

Invocation Methods

Command Line Invocation

# Run complete pipeline: extraction -> cropping -> understanding analysis, supports input in .pdf, .jpg, .png and other formats
python scripts/glm_ocr_pipeline.py \
  --file_path "/data/report_page.jpg" \
  --output_dir "/data/output"

API Parameter Description

ParameterTypeRequiredDescription
file_pathstringAbsolute path to input file (supports .pdf, .png, .jpg)
output_dirstringResult output directory (used to save cropped images and JSON reports)

Return Result Structure (JSON)

The tool returns a list containing layout elements and their deep understanding:

[
  {
    "type": "table",
    "bbox": [100, 200, 500, 600],
    "content_info": "| Revenue | Q1 |\n|---|---|\n| 100M | ... |",
    "deep_understanding": "(Generated by GLM-4.7) This table shows Q1 2024 revenue data. Combined with the 'market expansion strategy' mentioned in paragraph 3 of the body text, it can be seen that..."
  },
  {
    "type": "image",
    "bbox": [100, 700, 500, 900],
    "content_info": "/data/output/images/report_page_img_2.png",
    "deep_understanding": "(Generated by GLM-4.6V) This is a system architecture diagram. Visually, it shows the flow of clients connecting to servers through a Load Balancer. Combined with the title 'Fig 3' and context, this diagram is mainly used to illustrate..."
  }
]

Environment Requirements

  • Environment variable ZHIPU_API_KEY must be configured
  • Python 3.8+
  • Dependencies: zhipuai, pillow, beautifulsoup4

Notes

1. Model Routing Strategy

  • Table (表格): Content passed to GLM-4.7, combined with full text Markdown context for logical reasoning
  • Image (图片): Image Base64 encoded and passed to GLM-4.6V, combined with OCR-extracted titles and full text context for multimodal understanding

2. Context Association

All understanding is based on the complete layout logic of the document (Markdown Context), not isolated fragment analysis.

3. PDF Processing

Multi-page PDFs default to processing the first page. For batch processing, please extend the loop logic at the script level.

Files

5 total
Select a file
Select a file to preview.

Comments

Loading comments…