Install
openclaw skills install aliyun-qwen-ocrUse when OCR-specialized extraction is needed with Alibaba Cloud Model Studio Qwen OCR models (`qwen-vl-ocr`, `qwen-vl-ocr-latest`, and snapshots), including document parsing, table parsing, multilingual OCR, formula recognition, and key information extraction.
openclaw skills install aliyun-qwen-ocrCategory: provider
mkdir -p output/aliyun-qwen-ocr
python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txt
Pass criteria: command exits 0 and output/aliyun-qwen-ocr/validate.txt is generated.
output/aliyun-qwen-ocr/.Use Qwen OCR when the task is primarily text extraction or document structure parsing rather than broad visual reasoning.
Use one of these exact model strings:
qwen-vl-ocrqwen-vl-ocr-latestqwen-vl-ocr-2025-11-20qwen-vl-ocr-2025-08-28qwen-vl-ocr-2025-04-13qwen-vl-ocr-2024-10-28Selection guidance:
qwen-vl-ocr for the stable channel.qwen-vl-ocr-latest only when you explicitly want the newest OCR behavior.qwen-vl-ocr-2025-11-20 when you need reproducible document parsing based on the Qwen3-VL OCR upgrade.python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.image (string, required): HTTPS URL, local path, or data: URL.model (string, optional): default qwen-vl-ocr.prompt (string, optional): use when you want custom extraction instructions.task (string, optional): built-in OCR task.task_config (object, optional): configuration for built-in task such as extraction fields.enable_rotate (bool, optional): default false.min_pixels (int, optional)max_pixels (int, optional)max_tokens (int, optional)temperature (float, optional): recommended to keep near default/low values.text (string): extracted text or structured markdown/html-style output.model (string)usage (object, optional)Use one of these values in task:
text_recognitionkey_information_extractiondocument_parsingtable_parsingformula_recognitionmulti_lanadvanced_recognitionCustom prompt:
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
--image "https://example.com/invoice.png" \
--prompt "Extract seller name, invoice date, amount, and tax number in JSON."
Built-in task:
python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
--image "https://example.com/table.png" \
--task table_parsing \
--model qwen-vl-ocr-2025-11-20
qwen-vl-ocr and older snapshots default to 4096 max output tokens unless higher limits are approved by Alibaba Cloud; qwen-vl-ocr-2025-11-20 follows the model maximum.max_pixels only when small text is missed; this raises token cost.output/aliyun-qwen-ocr/request.jsonOUTPUT_DIR.references/api_reference.mdreferences/sources.md