Image Vision
v1.0.0Analyze and interpret images by describing content, extracting text, answering questions, comparing visuals, and extracting structured data from JPG, PNG, GI...
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (image analysis, OCR, comparison, extraction) align with the SKILL.md which instructs the agent to call image()/images() on user-supplied image paths and use built-in vision capabilities; no unrelated env vars, binaries, or installs are requested.
Instruction Scope
Instructions only direct the agent to analyze user-provided image file paths and prompts (OCR, Q&A, comparison, extraction). The SKILL.md does not instruct reading arbitrary system files, accessing unrelated environment variables, or sending data to external endpoints beyond the platform's built-in vision processing.
Install Mechanism
No install spec or code files are present (instruction-only), so nothing will be downloaded or written to disk by the skill itself.
Credentials
The skill declares no required environment variables, credentials, or config paths — consistent with its described use of built-in vision capabilities.
Persistence & Privilege
always is false and there are no instructions to modify other skills or system-wide settings; autonomous invocation is allowed by default but is not combined with broad access or persistence requests.
Assessment
This skill appears coherent and limited to analyzing images you supply. Before using it: (1) avoid uploading sensitive images (IDs, credit cards, passwords) because OCR will extract readable text; (2) confirm how the platform handles and stores image content and outputs (retention, sharing, logging); (3) test on harmless images first to verify behavior; and (4) if you need the analysis to run offline or avoid any external transmission, verify your platform's execution model for vision tasks before providing private images.Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Vision Analyze
Analyze images using the built-in vision capabilities of multimodal AI models.
Quick Start
Analyze an Image
Describe what's in an image:
# The agent will automatically use vision when you provide an image path
image("/path/to/image.jpg", prompt="Describe what's in this image")
Extract Text (OCR)
Extract text from images:
image("/path/to/document.png", prompt="Extract all text from this image")
Analyze Multiple Images
Compare or analyze multiple images:
images(["/path/to/image1.jpg", "/path/to/image2.jpg"],
prompt="Compare these two images and describe the differences")
Usage Patterns
Visual Q&A
Ask specific questions about image content:
image("menu.jpg", prompt="What are the prices of the main courses?")
image("chart.png", prompt="What trend does this graph show?")
image("screenshot.png", prompt="What error message is displayed?")
Content Moderation
Check image content:
image("upload.jpg", prompt="Is this image appropriate for a professional setting?")
Data Extraction
Extract structured data from visual content:
image("receipt.jpg", prompt="Extract the date, total amount, and items purchased")
image("business_card.png", prompt="Extract name, phone, email, and company")
image("form.jpg", prompt="Extract all filled fields as key-value pairs")
Visual Comparison
Compare images:
images(["before.jpg", "after.jpg"],
prompt="What changes were made between these two images?")
Tips
- Be specific: The more specific your prompt, the better the results
- Multiple images: You can analyze up to 20 images at once
- Supported formats: JPG, PNG, GIF, WebP
- Size limits: Large images are automatically resized
When to Use
- Reading text from screenshots, documents, or photos
- Describing visual content for accessibility
- Analyzing charts, graphs, or diagrams
- Comparing visual changes
- Extracting data from forms or receipts
- Understanding UI elements or error messages
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
