Install
openclaw skills install ascii-visionFallback image viewer when vision models are unavailable. Converts images to ASCII art via ffmpeg + Python for brightness distribution, texture analysis, edge detection, and color sampling — without any vision API.
openclaw skills install ascii-visionFallback image viewer when vision models are unavailable (rate limited, model down, no provider configured, etc.). Converts images to ASCII art using ffmpeg + Python so you (or the agent) can identify visual content — shapes, brightness distribution, textures, and structure — without relying on any vision API.
Also includes color sampling via raw pixel extraction for basic hue identification, and edge detection for texture quantification.
image/vision_analyze returns rate limit, model unavailable, or timeout errors--stats outputs brightness average, pixel distribution, and unique levels--edges detects sharp transitions (edges) for texture quantificationffmpeg + xxd extracts RGB hex values from specific regions| Range | Char | Meaning |
|---|---|---|
| 0–25 | | Pure black |
| 26–51 | . | Very dark |
| 52–76 | : | Dark |
| 77–102 | - | Mid-dark |
| 103–127 | = | Medium |
| 128–153 | + | Mid-light |
| 154–179 | * | Light |
| 180–204 | # | Very light |
| 205–229 | % | Near white |
| 230–255 | @ | Pure white |
The bundled script is at scripts/ascii_viewer.py. Reference it relative to the skill directory:
SCRIPT=scripts/ascii_viewer.py
It accepts optional --width (default: 60) for columns. When paired with ffmpeg's scale=W:-1, height is auto-detected from the pixel data, preserving aspect ratio without distortion.
Requirements:
ffmpeg (with rawvideo support — which ffmpeg)python3# Default 60 columns (auto-height, aspect ratio preserved)
ffmpeg -y -i <image> -vf "scale=60:-1,format=gray" -frames:v 1 -f rawvideo pipe: 2>/dev/null \
| python3 scripts/ascii_viewer.py
# Custom width
ffmpeg -y -i <image> -vf "scale=80:-1,format=gray" -frames:v 1 -f rawvideo pipe: 2>/dev/null \
| python3 scripts/ascii_viewer.py --width 80
ffmpeg -y -i <image> -vf "scale=60:-1,format=gray" -frames:v 1 -f rawvideo pipe: 2>/dev/null \
| python3 scripts/ascii_viewer.py --stats --edges
# Example output:
# brightness_avg=142/255
# bright_pixels=1200
# dark_pixels=800
# unique_levels=180
# edges_detected=400/3600
# Overall average color (RGB hex)
ffmpeg -y -i <image> -vf "scale=1:1,format=rgb24" -frames:v 1 -f rawvideo pipe: 2>/dev/null \
| xxd -p | head -c 6
# Specific region (e.g. bottom-center quarter)
ffmpeg -y -i <image> -vf "crop=iw/2:ih/4:iw/4:3*ih/4,scale=1:1,format=rgb24" -frames:v 1 -f rawvideo pipe: 2>/dev/null \
| xxd -p
for f in *.jpg; do
echo "=== $f ==="
ffmpeg -y -i "$f" -vf "scale=60:-1,format=gray" -frames:v 1 -f rawvideo pipe: 2>/dev/null \
| python3 scripts/ascii_viewer.py --stats
echo ""
done
| Width | Use case |
|---|---|
| 40 | Quick scan, simple images |
| 60 | Balanced readability vs detail (default) |
| 80 | More detail, complex images |
| 120 | Maximum detail (may be too wide for chat) |
@%# → bright scene, well-lit.-: → dark scene, night-time#%@ → bright objects, light sources, highlights-= → edges, furniture, structures*#%@ intermixed) → detailed surfaces (fabric, foliage, textured objects)+=-:#%@ formations → technical diagram, text overlayPair ASCII structural data with RGB color samples for richer diagnosis:
IMG="$1"
# 1. Original dimensions
ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=p=0 "$IMG"
# 2. ASCII + stats + edges
ffmpeg -y -i "$IMG" -vf "scale=60:-1,format=gray" -frames:v 1 -f rawvideo pipe: 2>/dev/null \
| python3 scripts/ascii_viewer.py --stats --edges
# 3. Color info
echo "Average color (RGB hex):"
ffmpeg -y -i "$IMG" -vf "scale=1:1,format=rgb24" -frames:v 1 -f rawvideo pipe: 2>/dev/null \
| xxd -p | head -c 6
echo "Bottom region color:"
ffmpeg -y -i "$IMG" -vf "crop=iw/2:ih/4:iw/4:3*ih/4,scale=1:1,format=rgb24" -frames:v 1 -f rawvideo pipe: 2>/dev/null \
| xxd -p
ASCII art is a mechanical fallback — it does NOT replace a vision model.
| Detects | Does NOT detect |
|---|---|
| Overall brightness (light vs dark scene) | Semantic meaning (what the subject is) |
| Contrast between regions | Color (everything is grayscale without xxd) |
| Texture (smooth vs detailed surface) | Legible text (only knows "something is there") |
| Lighting gradients (top-down, side, etc.) | Faces, emotions, or expressions |
| Edges and sharp transitions | Specific objects (person, cat, mask) |
| Spatial distribution of content | Depth, perspective, or real dimensions |
Good for:
Not good for:
ASCII gives you structural data (brightness, texture, edges), not semantics. Like looking at a photo with your eyes closed — you can feel light and shadow, but you can't name what you see.
which ffmpeg before attempting. Minimal Docker images may lack it.--height manually, it must match the ffmpeg scale=W:H output row count, or the ASCII will be misaligned.which ffmpeg)scripts/ascii_viewer.py exists and is executablescale=W:-1 in ffmpeg to auto-preserve aspect ratio (or match --height if manual)