Zai Vision
v1.0.0Z.AI Vision analysis using GLM-4.6V model for image and video understanding. Use when Claude needs to analyze images (screenshots, UI designs, photos, diagra...
Z.AI Vision
Overview
This skill provides Z.AI's GLM-4.6V vision model capabilities for analyzing images and videos through Python scripts. Use it for OCR, UI design analysis, technical diagrams, error screenshots, data visualizations, and video scene understanding.
Quick Start
Prerequisites
- Install the Z.AI SDK:
pip install zai-sdk
- Set your API key:
export ZAI_API_KEY='your-api-key'
The API key is required for all vision operations.
Basic Image Analysis
python3 /root/clawd/zai-vision/scripts/vision_analyze.py <image_path> "<prompt>"
Example:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py screenshot.png "Describe this UI"
Basic Video Analysis
python3 /root/clawd/zai-vision/scripts/video_analyze.py <video_path> "<prompt>"
Example:
python3 /root/clawd/zai-vision/scripts/video_analyze.py clip.mp4 "What's happening?"
Capabilities
Image Analysis
OCR / Text Extraction
python3 /root/clawd/zai-vision/scripts/vision_analyze.py doc-scan.jpg "Extract all text"
UI Design Analysis
python3 /root/clawd/zai-vision/scripts/vision_analyze.py ui-mockup.png "Analyze this UI design and list all components"
Error Diagnosis
python3 /root/clawd/zai-vision/scripts/vision_analyze.py error.png "What error is shown and how do I fix it?"
Technical Diagrams
python3 /root/clawd/zai-vision/scripts/vision_analyze.py architecture.png "Explain this architecture diagram"
Data Visualization
python3 /root/clawd/zai-vision/scripts/vision_analyze.py chart.png "What insights does this chart show?"
Video Analysis
Scene Description
python3 /root/clawd/zai-vision/scripts/video_analyze.py demo.mp4 "Describe what's happening"
Note: Video analysis works best with short clips (≤8MB). Videos are processed frame-by-frame.
Parameters
| Parameter | Default | Purpose |
|---|---|---|
--model | glm-4.6v | Vision model to use |
--max-tokens | 2000 | Max response tokens |
--temperature | 0.5 | 0-2, lower=factual, higher=creative |
--json | false | Output structured JSON |
Example with parameters:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py image.jpg "Describe this" \
--temperature 0.3 \
--max-tokens 500 \
--json
Integration with Safe Scripts
When running in the /root/clawd workspace, use clawd-run for safety:
clawd-run /root/clawd/zai-vision/scripts/vision_analyze.py image.png "Analyze"
This provides automatic backups, validation, and timeout protection.
Error Handling
Missing API key:
❌ ZAI_API_KEY environment variable not set
Set it: export ZAI_API_KEY='your-key'
Image not found:
❌ Image file not found: /path/to/image.jpg
Verify the file path.
SDK not installed:
❌ zai-sdk not installed
Install with: pip install zai-sdk
Common Patterns
Pattern 1: Batch Process Multiple Images
for img in /path/to/images/*.png; do
python3 /root/clawd/zai-vision/scripts/vision_analyze.py "$img" "Describe this image"
done
Pattern 2: Extract and Save JSON
python3 /root/clawd/zai-vision/scripts/vision_analyze.py image.jpg "Analyze" --json > output.json
Pattern 3: Specific Analysis Type
Code from screenshot:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py code.png "Extract the code and explain what it does"
Form field extraction:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py form.jpg "List all form fields and their types"
Brand guidelines check:
python3 /root/clawd/zai-vision/scripts/vision_analyze.py design.png "Check if this follows brand guidelines"
Tips for Best Results
- Specific prompts: "List all UI components" > "What's this?"
- High quality images: Better resolution = better understanding
- Temperature: 0.2-0.5 for factual, 0.7-1.0 for creative
- Video limits: Keep videos ≤8MB for best performance
- Handle errors: Always check return codes and error messages
Resources
Scripts
scripts/vision_analyze.py- Image analysis with GLM-4.6Vscripts/video_analyze.py- Video analysis (frame-by-frame)
References
references/API.md- Complete API documentation and examples
When to Use This Skill
Use this skill when you need to:
- Analyze screenshots, photos, or images
- Extract text from images (OCR)
- Understand technical diagrams or charts
- Diagnose errors from screenshots
- Analyze UI designs or mockups
- Describe video scenes
- Process visual content programmatically
For more detailed API information, see references/API.md.
