Aliyun Qwen Ocr
v1.0.0Use when OCR-specialized extraction is needed with Alibaba Cloud Model Studio Qwen OCR models (`qwen-vl-ocr`, `qwen-vl-ocr-latest`, and snapshots), including...
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The name/description (Qwen OCR helper) matches the included artifacts: SKILL.md, API reference, and a small Python script that prepares OCR request payloads. Requiring an Alibaba Cloud Dashscope API key is consistent with calling Model Studio endpoints. However, the skill metadata declares no required environment variables while the SKILL.md explicitly asks the user to set DASHSCOPE_API_KEY or add dashscope_api_key to ~/.alibabacloud/credentials — an incoherence in declared requirements.
Instruction Scope
Runtime instructions are narrowly scoped: validate the Python file compiles, generate and save a normalized request payload to output/aliyun-qwen-ocr/request.json, and keep run metadata. The SKILL.md tells the agent how to format requests and which models/tasks to use. It does not instruct the agent to read unrelated system files or exfiltrate data. The only notable instruction beyond payload prep is to supply a DASHSCOPE_API_KEY (see environment_proportionality).
Install Mechanism
There is no install spec and the skill is instruction-only plus a small helper script — nothing is downloaded or written during install. This is low-risk from an install perspective.
Credentials
The SKILL.md requires DASHSCOPE_API_KEY (or a dashscope_api_key entry in ~/.alibabacloud/credentials) to call Alibaba endpoints, which is reasonable for the stated purpose. However, the registry metadata lists no required environment variables or primary credential — this mismatch is concerning because a user may not realize an API key is needed or that the skill expects it. Additionally, the SKILL.md recommends installing the 'requests' package, but the included Python script does not import or use requests (the script only constructs JSON payloads). These inconsistencies should be resolved so users understand what secrets and dependencies are actually required.
Persistence & Privilege
The skill does not request persistent or elevated platform privileges. always is false and disable-model-invocation is false (normal). The skill writes only to its own output directory and does not modify other skills or system-wide configs.
What to consider before installing
This skill is basically a small helper that builds Qwen OCR request JSON; it does not itself send requests. Before installing, ask the publisher to fix two things: (1) declare DASHSCOPE_API_KEY (or equivalent) in the skill metadata if the skill expects an API key, and (2) remove or justify the 'requests' dependency note (the included script does not use requests). If you do provide an API key, treat it like any cloud credential: limit its permissions, store it securely (not in shared shells), and audit usage. If you don't trust the publisher, inspect or run the prepare_ocr_request.py locally in a sandbox and avoid giving the API key until metadata is corrected.Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Category: provider
Model Studio Qwen OCR
Validation
mkdir -p output/aliyun-qwen-ocr
python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txt
Pass criteria: command exits 0 and output/aliyun-qwen-ocr/validate.txt is generated.
Output And Evidence
- Save request payloads, selected OCR task name, and normalized output expectations under
output/aliyun-qwen-ocr/. - Keep the exact model, image source, and task configuration with each saved run.
Use Qwen OCR when the task is primarily text extraction or document structure parsing rather than broad visual reasoning.
Critical model names
Use one of these exact model strings:
qwen-vl-ocrqwen-vl-ocr-latestqwen-vl-ocr-2025-11-20qwen-vl-ocr-2025-08-28qwen-vl-ocr-2025-04-13qwen-vl-ocr-2024-10-28
Selection guidance:
- Use
qwen-vl-ocrfor the stable channel. - Use
qwen-vl-ocr-latestonly when you explicitly want the newest OCR behavior. - Pin
qwen-vl-ocr-2025-11-20when you need reproducible document parsing based on the Qwen3-VL OCR upgrade.
Prerequisites
- Install dependencies (recommended in a venv):
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
- Set
DASHSCOPE_API_KEYin environment, or adddashscope_api_keyto~/.alibabacloud/credentials.
Normalized interface (ocr.extract)
Request
image(string, required): HTTPS URL, local path, ordata:URL.model(string, optional): defaultqwen-vl-ocr.prompt(string, optional): use when you want custom extraction instructions.task(string, optional): built-in OCR task.task_config(object, optional): configuration for built-in task such as extraction fields.enable_rotate(bool, optional): defaultfalse.min_pixels(int, optional)max_pixels(int, optional)max_tokens(int, optional)temperature(float, optional): recommended to keep near default/low values.
Response
text(string): extracted text or structured markdown/html-style output.model(string)usage(object, optional)
Built-in OCR tasks
Use one of these values in task:
text_recognitionkey_information_extractiondocument_parsingtable_parsingformula_recognitionmulti_lanadvanced_recognition
Quick start
Custom prompt:
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
--image "https://example.com/invoice.png" \
--prompt "Extract seller name, invoice date, amount, and tax number in JSON."
Built-in task:
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
--image "https://example.com/table.png" \
--task table_parsing \
--model qwen-vl-ocr-2025-11-20
Operational guidance
- Prefer built-in OCR tasks for standard parsing jobs because they use official task prompts.
- For critical business fields, add downstream validation rules after OCR.
qwen-vl-ocrand older snapshots default to4096max output tokens unless higher limits are approved by Alibaba Cloud;qwen-vl-ocr-2025-11-20follows the model maximum.- Increase
max_pixelsonly when small text is missed; this raises token cost.
Output location
- Default output:
output/aliyun-qwen-ocr/request.json - Override base dir with
OUTPUT_DIR.
References
references/api_reference.mdreferences/sources.md
Files
5 totalSelect a file
Select a file to preview.
Comments
Loading comments…
