Vlm Image Helper

v0.1.0

Visual inspection helper for VLM and OCR workflows. Use when agent needs to help a vision model see an image more clearly before re-analysis: rotate misalign...

2· 164·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for testlbin/vlm-image-helper.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Vlm Image Helper" (testlbin/vlm-image-helper) from ClawHub.
Skill page: https://clawhub.ai/testlbin/vlm-image-helper
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install vlm-image-helper

ClawHub CLI

Package manager switcher

npx clawhub@latest install vlm-image-helper
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (VLM/OCR preprocessing) match the provided CLI, SKILL.md, README and the included script: rotation, semantic cropping, scaling and enhancement are implemented and expected for this purpose.
Instruction Scope
SKILL.md limits scope to minimal transformations for re-analysis and documents input/output rules. The runtime instructions only reference the included script and local image inputs/outputs (file paths, data URIs, base64) — no directives to read unrelated files or send data externally.
Install Mechanism
No install spec in registry; the code is instruction-only plus a small Python script. The only external dependency is Pillow (pip), which is reasonable and documented. No downloads from unknown URLs or archive extraction are used.
Credentials
The skill requires no environment variables, no credentials, and no config paths. The script uses only local temp files and in-memory base64 — proportional to the stated functionality.
Persistence & Privilege
Skill is not always-enabled and uses no privileged agent APIs or modifications to other skills. It writes only its own temporary output files when asked and returns base64 on demand.
Assessment
This appears to be a focused, local image-preprocessing helper suitable for VLM/OCR workflows. Before installing or running: (1) review the full script if you will process sensitive images (it operates locally and returns base64 or files, but you should avoid pasting secrets into command arguments), (2) install Pillow from the official PyPI source (pip install Pillow) or via your vetted package manager, and (3) confirm your agent won't forward image data to external services unless you intend that. If you need a deeper audit, provide the remainder of scripts/image_helper.py for a line-by-line review.

Like a lobster shell, security has layers — review code before you run it.

latestvk977cdhsvgj03s3586wts97tn98331gg
164downloads
2stars
1versions
Updated 1mo ago
v0.1.0
MIT-0

VLM Image Helper

Treat this skill as a visual aid for the model, not as a general image editor.

Use scripts/image_helper.py to create a clearer intermediate image, then re-run analysis on that result.

Core Workflow

  1. Start from the original image path, a raw base64 string, or a data URI.
  2. Apply the smallest transformation that is likely to remove the ambiguity.
  3. Prefer semantic crop presets over manual coordinates unless the exact box is already known.
  4. Return the processed image as a file or base64, then re-read that result.
  5. If the image is still unclear, iterate once with a tighter crop or stronger zoom instead of stacking many edits at once.

Quick Commands

# Rotate sideways text
python scripts/image_helper.py image.png --rotate 90 -o rotated.png

# Crop a likely area and zoom it
python scripts/image_helper.py image.png --crop-preset bottom-right --scale-preset x3 -o detail.png

# Improve low-contrast text
python scripts/image_helper.py image.png --auto-enhance -o enhanced.png

# Convert an existing file path directly to base64
python scripts/image_helper.py image.png --base64

Choose the Next Action

  • Text is sideways or upside down: use --rotate.
  • Only one region matters: use --crop-preset first, then add --scale-preset.
  • Small text or icons are hard to read: use --scale-preset x2 or x3.
  • Contrast is weak or edges are fuzzy: use --auto-enhance, or manually tune --contrast and --sharpness.
  • Another tool needs inline image data instead of a file path: add --base64.
  • The source image arrives as raw base64 or a data URI: use --input-mode auto or force --input-mode base64 / data-uri.

Input and Output Rules

  • Accept a file path, raw base64 string, or data URI as input.
  • Return a file with -o or return inline base64 with --base64.
  • Allow passthrough output with no edits when the only goal is format conversion or path-to-base64 conversion.

References

  • Full CLI reference: references/cli-reference.md
  • Crop and scale preset table: references/presets.md

Prerequisite

Install Pillow if it is missing:

pip install Pillow
# or
uv pip install Pillow

Comments

Loading comments...