vision-skill

v1.0.0

Use this skill for computer vision tasks including image recognition (OCR, object detection) and image generation (text-to-image, image-to-image). Supports a...

⭐ 0· 368·1 current·1 all-time

by@lgwanai

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lgwanai/vision-skill.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "vision-skill" (lgwanai/vision-skill) from ClawHub.
Skill page: https://clawhub.ai/lgwanai/vision-skill
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install vision-skill

ClawHub CLI

Package manager switcher

npx clawhub@latest install vision-skill

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description describe vision recognition and image generation and the code implements Tencent COS uploads and calls a Doubao (Volcengine) API — these capabilities align with the stated purpose. However the registry metadata lists no required env vars while the SKILL.md, README and code require COS_* and DOUBAO_* credentials, which is an incoherence between metadata and actual requirements.

✓

Instruction Scope

SKILL.md and CLI instruct uploading local images to COS, calling Doubao endpoints, storing async task files under a local .tasks/ directory, and optionally downloading generated images — the instructions and included code stay within that scope and do not attempt to read unrelated system files or credentials beyond those needed for COS/Doubao.

ℹ

Install Mechanism

This is labelled as instruction-only in the registry, but the package includes Python source and a requirements.txt (requests, python-dotenv, cos-python-sdk-v5). There is no download-from-URL or opaque installer; installing implies pip installing listed deps and running bundled scripts. The discrepancy between 'no install spec' and presence of code is noteworthy but not inherently malicious.

Credentials

The code requires Tencent COS credentials (COS_SECRET_ID, COS_SECRET_KEY, COS_BUCKET_NAME, COS_REGION) and DOUBAO_API_KEY (plus optional fallback model vars). Those credentials are appropriate for the described cloud storage and model API usage, but the registry metadata incorrectly declared 'Required env vars: none' — a meaningful mismatch. Also the COS client uses permanent keys (Token=None), so users should understand they're providing full access keys rather than short-lived tokens.

✓

Persistence & Privilege

The skill does not request always:true or global agent privileges. It writes task state and logs under a local .tasks/ directory and spawns worker processes when a task is submitted — expected for an async CLI-style skill. It does not modify other skills' configs or system-wide settings.

What to consider before installing

Key points before installing: - Do NOT trust the registry metadata that says 'no env vars' — this skill requires your Tencent COS keys and a Doubao/Volcengine API key. Only provide those secrets if you intend the skill to upload images to your COS bucket and call the Doubao API. - Verify the API endpoint: the client uses https://ark.cn-beijing.volces.com/api/v3 which does not match the README link to console.volcengine.com; confirm this hostname is legitimate for your provider or replace it with an official endpoint from your Doubao/Volcengine account. - Use least-privilege credentials: create a COS bucket and keys scoped to that bucket (and consider using short-lived tokens if possible) rather than reusing broad permanent keys. - Inspect and run the code in an isolated environment first (e.g., throwaway VM or container). The scripts will write to a local .tasks directory and .tasks/worker.log, spawn background worker processes, and upload local files to COS — confirm that behavior is acceptable. - If you will expose sensitive images, set the COS bucket permissions appropriately (private by default) and review how temporary URLs are generated/used. - If anything (metadata mismatch, unusual base_url, or unexpected network endpoints) looks off, ask the publisher for clarification or consider alternative, better-audited tools.

Like a lobster shell, security has layers — review code before you run it.

latestvk975dfav71sxf6m3tdyezxh36h838177

368downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

Vision Skill

Overview

This skill provides capabilities for visual recognition and image generation using Doubao AI models. It handles image storage via Tencent Cloud COS and executes tasks asynchronously.

Capabilities

1. Vision Recognition

Analyze images to describe content, extract text (OCR), or answer questions about the image.

Input: Local image path or URL, optional prompt.
Process: Uploads local images to COS, then calls Doubao Vision API.
Output: Text description or answer.

2. Image Generation

Generate images from text prompts, optionally using reference images.

Text-to-Image: Generate images from a text description.
Image-to-Image: Generate images based on a reference image and text prompt.
Sequential Generation: Generate a series of consistent images (e.g., storyboards).

Usage

The skill is exposed via a CLI script scripts/vision_cli.py.

Prerequisites

Environment variables must be set in .env or the system environment:

COS_SECRET_ID, COS_SECRET_KEY, COS_REGION, COS_BUCKET_NAME
DOUBAO_API_KEY, DOUBAO_VISION_MODEL, DOUBAO_IMAGE_MODEL

Commands

Vision Recognition

# Basic Usage
python3 scripts/vision_cli.py recognize <image_path> --prompt "Describe this image"

# Using Presets (--format)
# Available formats: invoice, contract, form, slide, whiteboard, table, json, key_value, markdown_note, qa_pairs, code, ocr, analysis
python3 scripts/vision_cli.py recognize ./invoice.jpg --format json
python3 scripts/vision_cli.py recognize ./screenshot.png --format code

# Batch recognition
python3 scripts/vision_cli.py recognize ./a.jpg ./b.jpg ./c.jpg --format table --wait --output ./batch_result.json

# Quality mode and retry
python3 scripts/vision_cli.py recognize ./contract.png --format contract --quality high --retry 3 --wait

# Wait for result and save to file
python3 scripts/vision_cli.py recognize ./doc.jpg --format ocr --wait --output ./result.txt

Image Generation

# Text to Image with Style Presets (--style)
# Available styles: ppt, business_flat, cartoon, tech_isometric, hand_drawn, icon, photo, anime, sketch
python3 scripts/vision_cli.py generate "A cyberpunk city" --style anime

# Image to Image
python3 scripts/vision_cli.py generate "Make it snowy" --ref <image_path>

# Sequential Generation
python3 scripts/vision_cli.py generate "A story about a cat" --seq 4 --style cartoon

# Wait for result and save image
python3 scripts/vision_cli.py generate "App icon for a camera" --style icon --wait --output ./icon.png

# Quality mode and retry
python3 scripts/vision_cli.py generate "A SaaS architecture illustration" --style tech_isometric --quality high --retry 3 --wait

Check Status

python3 scripts/vision_cli.py status <task_id>
# Or save result if completed
python3 scripts/vision_cli.py status <task_id> --output ./final_result.png