Vision Tool

v1.1.3

Image recognition using Ollama + qwen3.5:4b with think=False for reliable content extraction.

⭐ 0· 113·0 current·0 all-time

byRuilizhen Hu@huruilizhen

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for huruilizhen/vision-tool.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Vision Tool" (huruilizhen/vision-tool) from ClawHub.
Skill page: https://clawhub.ai/huruilizhen/vision-tool
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: ollama, python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install vision-tool

ClawHub CLI

Package manager switcher

npx clawhub@latest install vision-tool

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description match the implementation: the code reads an image, Base64-encodes it, and posts to an Ollama /api/chat endpoint using model qwen3.5:4b. Required binaries (ollama, python3) are appropriate and no unrelated credentials or tools are requested.

ℹ

Instruction Scope

Runtime instructions only run local Python code and call the Ollama API at http://127.0.0.1:11434/api/chat; they read the provided image file and send its Base64 payload. This is coherent with image-analysis purpose, but be aware that if the Ollama service URL is changed from the default, image data could be sent to a remote host — the code itself does not exfiltrate to external endpoints by default.

✓

Install Mechanism

No install spec that downloads external artifacts; included code is pure Python using the requests library. There are no archive downloads or external installers declared in the skill metadata.

✓

Credentials

The skill declares no required environment variables or credentials. It uses sensible defaults (local Ollama URL). No secret or cloud credentials are requested, which is proportionate for a local-model vision tool.

✓

Persistence & Privilege

always:false and user-invocable:true (defaults) — the skill does not request forced persistent inclusion or elevated platform privileges. It does not modify other skills or system-wide configs.

Assessment

This skill appears to do exactly what it claims: it reads a local image file, Base64-encodes it, and POSTs it to an Ollama /api/chat endpoint on localhost. Before installing or running it, ensure you: 1) run a trusted Ollama instance locally (ollama serve) and have pulled qwen3.5:4b, 2) confirm the Ollama service is not proxying/forwarding requests to an untrusted remote endpoint (if you change the default URL the skill will send images to wherever that URL points), and 3) review and run the included tests in a safe environment. Because the skill does not request secrets or remote installs and the code is readable, there are no incoherent or disproportionate requests — but always verify you trust the Ollama server you will use (local vs remote).

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

👁️ Clawdis

Binsollama, python3

latestvk97e5029kqsh7a8enzamkwftq984qymx

113downloads

0stars

5versions

Updated 2w ago

v1.1.3

MIT-0

Vision Tool 👁️

Image recognition using Ollama + qwen3.5:4b. Uses /api/chat endpoint for direct content extraction.

Features

✅ Direct content extraction - Uses /api/chat endpoint for clean output
✅ Simplified architecture - No complex thinking field processing needed
✅ English prompts - Optimized for English language analysis
✅ Multi-channel support - Works in WeChat, Telegram, Discord, etc.
✅ Error handling - Full error recovery and reporting

Installation

Prerequisites

Ollama service: ollama serve (must be running)
qwen3.5:4b model: ollama pull qwen3.5:4b
Python 3.8+: Required for running the skill

Install the skill

clawhub install vision-tool

Development Setup (For Contributors)

If you want to contribute or modify the skill, see CONTRIBUTING.md for detailed development instructions.

Basic setup:

# Clone the repository
git clone https://github.com/HuRuilizhen/vision-tool
cd vision-tool

# Set up development environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

# Run tests
python3 -m pytest tests/

Usage

Basic usage

# From any OpenClaw channel
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg

# With custom prompt
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg --prompt "Describe this image"

# Debug output
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg --debug

Channel-specific examples

WeChat Channel:

# When receiving an image
exec: python3 /path/to/vision-tool/main.py "$IMAGE_PATH"

Telegram Channel:

# Reply to photo messages
exec: python3 /path/to/vision-tool/main.py "/path/to/telegram_photo.jpg"

Discord Channel:

# Process attachments
exec: python3 /path/to/vision-tool/main.py "./discord_attachment.jpg"

Example Output

Analysis (30.7s):
------------------------------------------------------------
The user wants a description of the image provided.
**1. Overall Composition:**
- It's a top-down view of a meal served on a white tray.
- There are six distinct dishes/bowls arranged...
**2. Detailed Breakdown of Dishes:**
- **Top Left:** A small white rectangular dish...
- **Top Middle:** A small white rectangular dish...
------------------------------------------------------------

How It Works

Image reading: Reads and Base64 encodes the image
API call: Calls Ollama /api/chat endpoint with qwen3.5:4b
Direct extraction: Gets analysis directly from content field
Fallback handling: Simple cleanup if thinking field is used
Output formatting: Generates clean analysis results

Performance

Average processing time: 25-35 seconds per image (hardware dependent)
Image size support: 100KB-500KB recommended
Token consumption: ~2000 tokens per image
API endpoint: Uses /api/chat for direct content access

Troubleshooting

Common Issues

Ollama not running: Run ollama serve first
Model not installed: Run ollama pull qwen3.5:4b
Image path incorrect: Use absolute paths or correct relative paths
Timeout: Model may take 30+ seconds for complex images

Performance Tips

Compress images to under 300KB for faster processing
Use clear, concise prompts
Ensure Ollama has sufficient system resources

API Reference

Python API

from vision_core import VisionAnalyzer

analyzer = VisionAnalyzer()
result = analyzer.analyze_image("image.jpg", "Describe this image")
print(result["analysis"])

Command Line

# Basic analysis
python3 main.py image.jpg

# Custom prompt
python3 main.py image.jpg --prompt "What objects are in this image?"

# Debug mode
python3 main.py image.jpg --debug

Development

File Structure

vision-tool/
├── SKILL.md          # This documentation
├── main.py           # Main skill script
├── scripts/
│   └── vision_core.py  # Core analysis engine
└── tests/
    └── test_basic.py   # Basic tests

Testing

# Test with example image
python3 main.py /path/to/test.jpg --prompt "Test analysis"

# Run unit tests
python3 -m pytest tests/

Changelog

v1.1.0 (2026-04-13)

Uses /api/chat endpoint for direct content extraction
Simplified architecture without complex thinking field processing
Default English prompt "Describe this image"
Removed regex dependencies for cleaner code

v1.0.0 (2026-04-12)

Initial release

Contributing

Issues and pull requests are welcome. Please ensure tests pass before submitting.

Vision Tool

Install with OpenClaw

CLI Commands

Runtime requirements

Vision Tool 👁️

Features

Installation

Prerequisites

Install the skill

Development Setup (For Contributors)

Usage

Basic usage

Channel-specific examples

Example Output

How It Works

Performance

Troubleshooting

Common Issues

Performance Tips

API Reference

Python API

Command Line

Development

File Structure

Testing

Changelog

v1.1.0 (2026-04-13)

v1.0.0 (2026-04-12)

Contributing

License

Comments