qwencloud-vision

v0.2.0

[QwenCloud] Understand images and videos with Qwen vision models. TRIGGER when: user wants to analyze, describe, or extract information from images or videos...

⭐ 0· 130·1 current·1 all-time

by@cuixiaoyang123

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for cuixiaoyang123/qwencloud-vision.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "qwencloud-vision" (cuixiaoyang123/qwencloud-vision) from ClawHub.
Skill page: https://clawhub.ai/cuixiaoyang123/qwencloud-vision
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install qwencloud-vision

ClawHub CLI

Package manager switcher

npx clawhub@latest install qwencloud-vision

Security Scan

Capability signals

CryptoCan make purchasesRequires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (Qwen vision models) match the included scripts and docs: OCR, visual reasoning, streaming, video handling, and use of DashScope/QwenCloud endpoints. However, the registry metadata declares no required environment variables or primary credential while the SKILL.md and Python scripts clearly require an API key (DASHSCOPE_API_KEY / QWEN_API_KEY) and optionally QWEN_BASE_URL/QWEN_REGION. The missing credential declaration in registry metadata is an incoherence.

Instruction Scope

SKILL.md and the scripts instruct the agent to load .env files, read script and reference files, resolve local file paths and optionally upload local files to provider-managed storage. The agent-compatibility guide includes steps to discover and append markers to project/agent config files (CLAUDE.md, AGENTS.md, etc.) — the doc says to 'Ask the user before modifying any file' but the capability to locate and modify multiple user config locations is present. These actions are within a vision client's plausible scope but are broad and affect user files; the skill also performs environment checks and may upload local files to remote storage, which the user should explicitly authorize.

✓

Install Mechanism

No install spec is included; the skill bundles Python scripts that use the standard library only. There are no download/install steps from external, untrusted URLs in the manifest. Running code from the included scripts will execute on the user's machine (no package install), which is expected for bundled scripts but still requires user trust.

Credentials

The skill legitimately needs a QwenCloud/DashScope API key (DASHSCOPE_API_KEY or QWEN_API_KEY), and optional QWEN_BASE_URL/QWEN_REGION for custom endpoints — these are proportionate to the stated purpose. However, the skill registry lists no required env vars (metadata mismatch). The scripts also load .env files and will accept writing a placeholder .env; the mismatch between declared registry requirements and actual required credentials is confusing and risky for non-expert users.

ℹ

Persistence & Privilege

always:false (normal). The skill includes guidance and code to append markers to agent/project config files to register skills for agents that don't auto-load skills; the docs instruct to ask user before modifying files. This behavior is potent (can alter user configs) but is documented and conditional on user consent — still worth flagging to users.

What to consider before installing

This skill appears to be a normal QwenCloud vision client (OCR, visual reasoning, video analysis) and its bundled Python scripts use only the standard library, but there are a few points to consider before installing or running it: - Registry metadata mismatch: the skill manifest in the registry does not list the required API key(s), yet the SKILL.md and scripts expect DASHSCOPE_API_KEY / QWEN_API_KEY. Treat this as a red flag — confirm with the skill author or source why the registry omitted required credentials. - Credentials: the scripts will look for your DASHSCOPE_API_KEY (or QWEN_API_KEY) in the environment or a .env file. Do not paste your real API key into chat. If you create a .env file via guidance in the docs, put a placeholder first and replace it locally; prefer setting the env var in a secure process. Verify the code never prints raw keys (the skill states it masks keys, but you should review the scripts yourself if you plan to run them). - File and config changes: the skill provides an agent-compatibility helper that can search for and append entries to project/agent config files (e.g., CLAUDE.md, AGENTS.md). It claims to ask before modifying files, but ensure you approve any changes and back up your configs first. - Local file uploads: the scripts can upload local files to provider-managed temp storage (oss://) instead of base64-embedding them. Only enable uploading if you are comfortable sending those files to QwenCloud/DashScope. For sensitive images/videos, use local base64 or avoid uploading. - Run in a controlled environment first: if you are unsure, run the scripts in a sandbox or review the bundled Python files (they are included) before executing. Check any network endpoints used (defaults are DashScope/QwenCloud endpoints like dashscope-intl.aliyuncs.com). If you require tighter assurance, request the skill author to update registry metadata to declare required env vars and document any config file modifications. If you want, I can extract the exact places in the code that require DASHSCOPE_API_KEY/QWEN_API_KEY and list which files would modify or scan user config paths so you can review them before running.

Like a lobster shell, security has layers — review code before you run it.

latestvk979n0vbgch2wnvfmrt9t9vsgs850hh2

130downloads

0stars

3versions

Updated 1w ago

v0.2.0

MIT-0

Agent setup: If your agent doesn't auto-load skills (e.g. Claude Code), see agent-compatibility.md once per session.

Qwen Vision (Image & Video Understanding)

Analyze images and videos using Qwen VL and QVQ models. This skill is part of qwencloud/qwencloud-ai.

Skill directory

Use this skill's internal files to execute and learn. Load reference files on demand when the default path fails or you need details.

Location	Purpose
`scripts/analyze.py`	Image/video understanding, multi-image, thinking mode
`scripts/reason.py`	Visual reasoning (QVQ, chain-of-thought, streaming)
`scripts/ocr.py`	OCR text extraction
`scripts/vision_lib.py`	Shared helpers (base64, upload, streaming)
`references/execution-guide.md`	Fallback: curl, code generation
`references/curl-examples.md`	Curl for base64, multi-image, video, OCR
`references/visual-reasoning.md`	QVQ and thinking mode details
`references/prompt-guide.md`	Query prompt templates by task, thinking mode decision
`references/ocr.md`	OCR parameters and examples
`references/sources.md`	Official documentation URLs
`references/agent-compatibility.md`	Agent self-check: register skills in project config for agents that don't auto-load

Security

NEVER output any API key or credential in plaintext. Always use variable references ($DASHSCOPE_API_KEY in shell, os.environ["DASHSCOPE_API_KEY"] in Python). Any check or detection of credentials must be non-plaintext: report only status (e.g. "set" / "not set", "valid" / "invalid"), never the value. Never display contents of .env or config files that may contain secrets.

When the API key is not configured, NEVER ask the user to provide it directly. Instead, help create a .env file with a placeholder (DASHSCOPE_API_KEY=sk-your-key-here) and instruct the user to replace it with their actual key from the QwenCloud Console. Only write the actual key value if the user explicitly requests it.

Key Compatibility

Scripts require a standard QwenCloud API key (sk-...). Coding Plan keys (sk-sp-...) cannot be used for direct API calls and do not support dedicated vision models (qwen3-vl-plus, qvq-max, etc.). The scripts detect sk-sp- keys at startup and print a warning. If qwencloud-ops-auth is installed, see its references/codingplan.md for full details.

Model Selection

Model	Use Case
qwen3.6-plus	Preferred — latest flagship, unified multimodal (text+image+video). Thinking on by default. Best balance of quality, speed, cost.
qwen3.5-plus	Unified multimodal (text+image+video). Thinking on by default.
qwen3.5-flash	Fast multimodal — cheaper, faster. Thinking on by default.
qwen3-vl-plus	High-precision — object localization (2D/3D), document/webpage parsing.
qwen3-vl-flash	Fast vision — lower latency, 33 languages.
qvq-max	Visual reasoning — chain-of-thought for math, charts. Streaming only.
qwen-vl-ocr	OCR — text extraction, table parsing, document scanning.
qwen-vl-max	Qwen2.5-VL — best-performing in 2.5 series.
qwen-vl-plus	Qwen2.5-VL — faster, good balance of performance and cost, 11 languages.

User specified a model → use directly.
Consult the qwencloud-model-selector skill when model choice depends on requirement, scenario, or pricing.
No signal, clear task → qwen3.6-plus. Use qwen3-vl-plus for precise localization or 3D detection.

⚠️ Important: The model list above is a point-in-time snapshot and may be outdated. Model availability changes frequently. Always check the official model list for the authoritative, up-to-date catalog before making model decisions.

Model details: For more information about a specific model, direct the user to its detail page: https://www.qwencloud.com/models/<model-name> (replace <model-name> with the exact model ID, e.g. qwen3.6-plus → https://www.qwencloud.com/models/qwen3.6-plus). NEVER modify or guess the model name in the URL.

Execution

Prerequisites

API Key: Check that DASHSCOPE_API_KEY (or QWEN_API_KEY) is set using a non-plaintext check only (e.g. in shell: [ -n "$DASHSCOPE_API_KEY" ]; report only "set" or "not set", never the key value). If not set: run the * qwencloud-ops-auth* skill if available; otherwise guide the user to obtain a key from QwenCloud Console and set it via .env file ( echo 'DASHSCOPE_API_KEY=sk-your-key-here' >> .env in project root or current directory) or environment variable. The script searches for .env in the current working directory and the project root. Skills may be installed independently — do not assume qwencloud-ops-auth is present.
Python 3.9+ (stdlib only, no pip install needed)

Environment Check

Before first execution, verify Python is available:

python3 --version  # must be 3.9+

If python3 is not found, try python --version or py -3 --version. If Python is unavailable or below 3.9, skip to Path 2 (curl) in execution-guide.md.

Default: Run Script

Script path: Scripts are in the scripts/ subdirectory of this skill's directory (the directory containing this SKILL.md). You MUST first locate this skill's installation directory, then ALWAYS use the full absolute path to execute scripts. Do NOT assume scripts are in the current working directory. Do NOT use cd to switch directories before execution. Shared infrastructure lives in scripts/vision_lib.py.

Execution note: Run all scripts in the foreground — wait for stdout; do not background.

Discovery: Run python3 <this-skill-dir>/scripts/analyze.py --help (or reason.py, ocr.py) first to see all available arguments.

Script	Purpose	Default Model
`scripts/analyze.py`	Image understanding, multi-image, video, thinking mode, high-res	`qwen3.6-plus`
`scripts/reason.py`	Visual reasoning with chain-of-thought, video reasoning (always streaming)	`qvq-max`
`scripts/ocr.py`	OCR text extraction from documents, receipts, tables	`qwen-vl-ocr`

Input type fields (use exactly one in --request JSON):

Field	Use for	Example
`"image"`	Single image (URL or local path)	`"image": "photo.jpg"`
`"images"`	Multi-image comparison (array)	`"images": ["a.jpg", "b.jpg"]`
`"video"`	Video file (URL or local path)	`"video": "clip.mp4"`
`"video_frames"`	Video as frame array	`"video_frames": ["f1.jpg", "f2.jpg"]`

⚠️ Common mistake: Do NOT use "image" for video files — use "video" instead.

# Image analysis
python3 <this-skill-dir>/scripts/analyze.py \
  --request '{"prompt":"What is in this image?","image":"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}' \
  --output output/qwencloud-vision/result.json --print-response

# Video analysis (local file — add --upload-files for files >= 7 MB)
python3 <this-skill-dir>/scripts/analyze.py \
  --request '{"prompt":"Describe what happens in this video","video":"clip.mp4"}' \
  --upload-files --print-response

python3 <this-skill-dir>/scripts/reason.py \
  --request '{"prompt":"Solve this math problem step by step","image":"problem.png"}' \
  --print-response

python3 <this-skill-dir>/scripts/ocr.py \
  --request '{"image":"invoice.jpg"}' \
  --print-response

Argument	Description
`--request '{...}'`	JSON request body
`--file path.json`	Load request from file
`--output path`	Save response JSON to path
`--print-response`	Print response to stdout
`--stream`	Enable streaming (auto for thinking/QVQ)
`--upload-files`	Upload local files to temp storage (for files > 7 MB)
`--schema path.json`	JSON Schema for structured extraction

Verify Result

Exit code 0 + output contains valid JSON with choices field → success
Non-zero exit, HTTP error, empty response, or JSON with "code"/"message" error → fail
Post-execution check: When --output is used, verify the response JSON file exists and contains expected content
MANDATORY — stderr signal check: After confirming the result, scan the command's stderr output for [ACTION_REQUIRED] or [UPDATE_AVAILABLE]. If either signal is present, you MUST follow the instructions in Update Check below before responding to the user.

On Failure

If scripts fail, match the error output against the diagnostic table below to determine the resolution. If no match, read execution-guide.md for alternative paths: curl commands (Path 2), code generation (Path 3), and autonomous resolution (Path 5).

If Python is not available at all → skip directly to Path 2 (curl) in execution-guide.md.

Error Pattern	Diagnosis	Resolution
`command not found: python3`	Python not on PATH	Try `python` or `py -3`; install Python 3.9+ if missing
`Python 3.9+ required`	Script version check failed	Upgrade Python to 3.9+
`SyntaxError` near type hints	Python < 3.9	Upgrade Python to 3.9+
`QWEN_API_KEY/DASHSCOPE_API_KEY not found`	Missing API key	Obtain key from QwenCloud Console; add to `.env`: `echo 'DASHSCOPE_API_KEY=sk-...' >> .env`; or run qwencloud-ops-auth if available
`HTTP 401`	Invalid or mismatched key	Run qwencloud-ops-auth (non-plaintext check only); verify key is valid
`SSL: CERTIFICATE_VERIFY_FAILED`	SSL cert issue (proxy/corporate)	macOS: run `Install Certificates.command`; else set `SSL_CERT_FILE` env var
`URLError` / `ConnectionError`	Network unreachable	Check internet; set `HTTPS_PROXY` if behind proxy
`HTTP 429`	Rate limited	Wait and retry with backoff
`HTTP 5xx`	Server error	Retry with backoff
`PermissionError`	Can't write output	Use `--output` to specify writable directory

File Input

The API accepts: HTTP/HTTPS URL, Base64 data URI, and oss:// URL. Local file paths are NOT directly supported — scripts handle conversion automatically. Pass local paths directly; no manual upload step needed.

Large file rule: If the local file is >= 7 MB, always add --upload-files. Base64 encoding inflates size by ~33% and will exceed the 10 MB API limit. Small files (including short video clips < 7 MB) can use the default base64 path.

Method	When to use	How
Online URL	File already hosted	Pass URL directly — preferred for large files
Base64 (default)	Local files < 7 MB (images or short video clips)	Script auto-converts to `data:` URI
Temp upload	Local files >= 7 MB	Add `--upload-files` flag → uploads to DashScope temp storage (`oss://` URL, 48h TTL)

Production: Default temp storage has 48h TTL and 100 QPS upload limit — not suitable for production, high-concurrency, or load-testing. To use your own OSS bucket, set QWEN_TMP_OSS_BUCKET and QWEN_TMP_OSS_REGION in .env, install pip install alibabacloud-oss-v2, and provide credentials via QWEN_TMP_OSS_AK_ID / QWEN_TMP_OSS_AK_SECRET or the standard OSS_ACCESS_KEY_ID / OSS_ACCESS_KEY_SECRET. Use a RAM user with least-privilege (oss:PutObject + oss:GetObject on target bucket only). The --upload-files flag is still required for vision scripts to trigger upload. If qwencloud-ops-auth is installed, see its references/custom-oss.md for the full setup guide.

Input from Other Skills

When the input file comes from another skill's output (e.g., image-gen, video-gen):

Pass the URL directly (e.g., "image": "<image_url from image-gen>") — do NOT download the URL first
Downloading and re-passing as a local path wastes bandwidth and triggers unnecessary base64 encoding or OSS upload
All URL types are supported: https://, oss://, data:

Thinking Mode

Model	Thinking Default	Notes
`qwen3.6-plus`	On	Latest flagship. Disable with `enable_thinking: false` for simple tasks.
`qwen3.5-plus` / `qwen3.5-flash`	On	Disable with `enable_thinking: false` for simple tasks.
`qwen3-vl-plus` / `qwen3-vl-flash`	Off	Enable with `enable_thinking: true`.
`qvq-max`	Always on	Streaming output required.

See visual-reasoning.md for details.

OCR (qwen-vl-ocr)

Optimized for text extraction. Supports multi-language, skewed images, tables, formulas. See ocr.md for parameters and examples.

Input Limits

Images: BMP/JPEG/PNG/TIFF/WEBP/HEIC. Min 10px sides, aspect ratio <= 200:1. Max 20 MB (URL, Qwen3.5) / 10 MB (others).

Videos: MP4/AVI/MKV/MOV/FLV/WMV. Duration 2s–2h (Qwen3.5) / 2s–10min (others). Max 2 GB (URL) / 10 MB (base64). fps range [0.1, 10], default 2.0.

Error Handling

HTTP	Meaning	Action
401	Invalid or missing API key	Run qwencloud-ops-auth if available; else prompt user to set key (non-plaintext check only)
400	Bad request (invalid format)	Verify messages format and image URL/format
429	Rate limited	Retry with exponential backoff
5xx	Server error	Retry with exponential backoff

Usage & billing: Use the qwencloud-usage skill to check usage, free tier quota, and billing directly. Alternatively, the user can visit the QwenCloud console: Usage Analytics | Pay-as-you-go Billing | Coding Plan Billing

NEVER fabricate, guess, or construct usage/billing/console URLs. Only provide the exact links listed in this skill. If a URL is not listed here, do not invent one.

Output Location

Prefer the current working directory. Default subdirectory: ./output/qwencloud-vision/.

Write prohibition: NEVER write output files into this skill's installation directory or any skills/ hierarchy. All generated content must go to output/ under the current working directory or a user-specified path.

Update Check (MANDATORY Post-Execution)

After script execution, check stderr for action signals. The script automatically detects update status and emits structured signals. You MUST act on them. This section also applies when the user explicitly asks to check for updates.

Signal: `[ACTION_REQUIRED] qwencloud-update-check skill is not installed`

When stderr contains this signal, you MUST present the following options to the user and wait for their response before proceeding. Do NOT silently skip this.

The update-check skill is not installed. It keeps you informed about new qwencloud/qwencloud-ai features and bug fixes.

Install now — I'll run the install command for you

Skip for now — do nothing this time

Never remind me — permanently suppress this prompt

After the user responds:

User chooses 1 → run: npx skills add QwenCloud/qwencloud-ai --skill qwencloud-update-check -y, then run: python3 <this-skill-scripts-dir>/gossamer.py --dismiss
User chooses 2 → run: python3 <this-skill-scripts-dir>/gossamer.py --dismiss, then continue normally
User chooses 3 → run: python3 <this-skill-scripts-dir>/gossamer.py --never-install, then continue normally

Signal: `[UPDATE_AVAILABLE]`

When stderr contains this signal, you MUST append a brief update notice to your response, including the version info and the update command shown in the stderr output.

No signal in stderr

If stderr contains neither [ACTION_REQUIRED] nor [UPDATE_AVAILABLE], no action is needed — the skill is installed and up to date (or cached within 24h).

Explicit user request

When the user explicitly asks to check for updates (e.g. "check for updates", "check version"):

Look for qwencloud-update-check/SKILL.md in sibling skill directories.
If found — run: python3 <qwencloud-update-check-dir>/scripts/check_update.py --print-response and report the result.
If not found — present the install options above.

References

execution-guide.md — Fallback paths (curl, code generation, autonomous)
curl-examples.md — Curl templates (base64, multi-image, video, OCR)
api-guide.md — API supplementary guide
visual-reasoning.md — QVQ visual reasoning guide
ocr.md — Qwen-VL-OCR text extraction guide
sources.md — Official documentation URLs

Comments

Loading comments...