Install
openclaw skills install image-visionAnalyze and interpret images by describing content, extracting text, answering questions, comparing visuals, and extracting structured data from JPG, PNG, GI...
openclaw skills install image-visionAnalyze images using the built-in vision capabilities of multimodal AI models.
Describe what's in an image:
# The agent will automatically use vision when you provide an image path
image("/path/to/image.jpg", prompt="Describe what's in this image")
Extract text from images:
image("/path/to/document.png", prompt="Extract all text from this image")
Compare or analyze multiple images:
images(["/path/to/image1.jpg", "/path/to/image2.jpg"],
prompt="Compare these two images and describe the differences")
Ask specific questions about image content:
image("menu.jpg", prompt="What are the prices of the main courses?")
image("chart.png", prompt="What trend does this graph show?")
image("screenshot.png", prompt="What error message is displayed?")
Check image content:
image("upload.jpg", prompt="Is this image appropriate for a professional setting?")
Extract structured data from visual content:
image("receipt.jpg", prompt="Extract the date, total amount, and items purchased")
image("business_card.png", prompt="Extract name, phone, email, and company")
image("form.jpg", prompt="Extract all filled fields as key-value pairs")
Compare images:
images(["before.jpg", "after.jpg"],
prompt="What changes were made between these two images?")