Vision/image understanding for agents whose model can't read images (returns "model does not support images", empty/unknown output, low confidence, or user-reported failure). Calls the Volcengine Ark (doubao) vision API, returns structured JSON. Use whenever an image must be understood. Do NOT substitute with local OCR (tesseract) — OCR extracts text only, not layout/visual understanding.

Install

openclaw skills install @vst93/vision-fallback-skill