Install
openclaw skills install zai-visionZ.AI Vision analysis using GLM-4.6V model for image and video understanding. Use when Claude needs to analyze images (screenshots, UI designs, photos, diagrams, charts) or videos with the Z.AI Vision API. Supports OCR, error diagnosis, technical diagram interpretation, UI analysis, data visualization reading, and video scene description.
openclaw skills install zai-visionThis skill provides Z.AI's GLM-4.6V vision model capabilities for analyzing images and videos through Python scripts. Use it for OCR, UI design analysis, technical diagrams, error screenshots, data visualizations, and video scene understanding.
pip install zai-sdk
export ZAI_API_KEY='your-api-key'
The API key is required for all vision operations.
python3 /root/clawd/zai-vision/scripts/vision_analyze.py <image_path> "<prompt>"
Example:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py screenshot.png "Describe this UI"
python3 /root/clawd/zai-vision/scripts/video_analyze.py <video_path> "<prompt>"
Example:
python3 /root/clawd/zai-vision/scripts/video_analyze.py clip.mp4 "What's happening?"
OCR / Text Extraction
python3 /root/clawd/zai-vision/scripts/vision_analyze.py doc-scan.jpg "Extract all text"
UI Design Analysis
python3 /root/clawd/zai-vision/scripts/vision_analyze.py ui-mockup.png "Analyze this UI design and list all components"
Error Diagnosis
python3 /root/clawd/zai-vision/scripts/vision_analyze.py error.png "What error is shown and how do I fix it?"
Technical Diagrams
python3 /root/clawd/zai-vision/scripts/vision_analyze.py architecture.png "Explain this architecture diagram"
Data Visualization
python3 /root/clawd/zai-vision/scripts/vision_analyze.py chart.png "What insights does this chart show?"
Scene Description
python3 /root/clawd/zai-vision/scripts/video_analyze.py demo.mp4 "Describe what's happening"
Note: Video analysis works best with short clips (≤8MB). Videos are processed frame-by-frame.
| Parameter | Default | Purpose |
|---|---|---|
--model | glm-4.6v | Vision model to use |
--max-tokens | 2000 | Max response tokens |
--temperature | 0.5 | 0-2, lower=factual, higher=creative |
--json | false | Output structured JSON |
Example with parameters:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py image.jpg "Describe this" \
--temperature 0.3 \
--max-tokens 500 \
--json
When running in the /root/clawd workspace, use clawd-run for safety:
clawd-run /root/clawd/zai-vision/scripts/vision_analyze.py image.png "Analyze"
This provides automatic backups, validation, and timeout protection.
Missing API key:
❌ ZAI_API_KEY environment variable not set
Set it: export ZAI_API_KEY='your-key'
Image not found:
❌ Image file not found: /path/to/image.jpg
Verify the file path.
SDK not installed:
❌ zai-sdk not installed
Install with: pip install zai-sdk
for img in /path/to/images/*.png; do
python3 /root/clawd/zai-vision/scripts/vision_analyze.py "$img" "Describe this image"
done
python3 /root/clawd/zai-vision/scripts/vision_analyze.py image.jpg "Analyze" --json > output.json
Code from screenshot:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py code.png "Extract the code and explain what it does"
Form field extraction:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py form.jpg "List all form fields and their types"
Brand guidelines check:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py design.png "Check if this follows brand guidelines"
scripts/vision_analyze.py - Image analysis with GLM-4.6Vscripts/video_analyze.py - Video analysis (frame-by-frame)references/API.md - Complete API documentation and examplesUse this skill when you need to:
For more detailed API information, see references/API.md.