Install
openclaw skills install huawei-cloud-ascend-models-deployHuawei Cloud Ascend model deployment and testing skill for large language models on Ascend DevServer (910B series). Supports single-machine and dual-machine deployment for LLM, VL (vision-language), Embedding, and Rerank models. Provides model inference testing, deployment log viewing, and status monitoring with automated model matching and deployment script generation. Use this skill when the user wants to: (1) deploy a model on Ascend DevServer, (2) test model inference, (3) view deployment logs or status, (4) list supported models, (5) check deployment prerequisites. Trigger: deploy, test, model list, deployment log, Ascend, DevServer, 910B, ModelArts, LLM, VL, Embedding, Rerank, multimodal, inference, model catalog, 昇腾, 部署模型, 测试模型, 模型列表, 部署日志, 模型部署, 推理测试
openclaw skills install huawei-cloud-ascend-models-deployDeploy and test large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment, model inference testing, and deployment monitoring.
This skill deploys and tests large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment for LLM, VL, Embedding, and Rerank models.
Related Skills (Agent orchestrated, no direct call, Rule 3):
huawei-cloud-ascend-remote-connect - SSH connection to DevServer (prerequisite for deployment)huawei-cloud-ascend-command - NPU status check and monitoring (prerequisite and post-deploy monitoring)Capabilities:
Deployment Workflow (Agent orchestrated):
huawei-cloud-ascend-remote-connect to establish SSH connectionhuawei-cloud-ascend-command to check NPU health and availabilityhuawei-cloud-ascend-models-deploy) to deploy modelhuawei-cloud-ascend-command to monitor NPU status during deployment┌─────────────────────────────────────────────────────────────────────┐
│ Agent Orchestration │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ 1. SSH connect (remote-connect) │ │
│ │ 2. NPU health check (ascend-command) │ │
│ │ 3. Deploy model (this skill) │ │
│ │ 4. Monitor NPU (ascend-command) │ │
│ └────────────────────────────┬────────────────────────────────┘ │
│ │ Explicit param passing (Rule 1) │
│ ▼ │
├─────────────────────────────────────────────────────────────────────┤
│ Huawei Cloud Ascend Models Deploy │
│ (Stateless, Rule 2) │
├─────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌──────────────────────────────────┐ │
│ │ Natural Language│ │ Deploy Helper │ │
│ │ Commands │───▶│ - Model Matching & Catalog │ │
│ └──────────────────┘ │ - Script Auto-Match │ │
│ │ - Command Generation │ │
│ └──────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌─────────────────┐ ┌────────┐ │
│ │ Model │ │ Inference │ │ Log │ │
│ │ Deployment │ │ Testing │ │ Status │ │
│ │ │ │ │ │ │ │
│ │ • Single-node │ │ • LLM Chat │ │ • View │ │
│ │ • Dual-node │ │ • VL Multimodal │ │ • Check│ │
│ │ • 910B Series │ │ • Embedding │ │ │ │
│ └───────────────┘ │ • Rerank │ └────────┘ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
User request: "Deploy Qwen2.5-72B on DevServer 116.204.23.145"
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 1: SSH Connection │
│ → Call huawei-cloud-ascend-remote-connect │
│ → Pass: host, user, password (explicit, Rule 1) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 2: NPU Health Check │
│ → Call huawei-cloud-ascend-command │
│ → Check: NPU list, health, HBM availability │
│ → Fail if NPU not healthy or insufficient HBM │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 3: Deploy Model (this skill) │
│ → Match model from catalog │
│ → Generate deploy script │
│ → Execute deployment │
│ → Stateless execution (Rule 2) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 4: Monitor NPU │
│ → Call huawei-cloud-ascend-command │
│ → Monitor: HBM usage, temperature, processes │
└─────────────────────────────────────────────────────────────┘
│
▼
Deployment Complete
| Skill | Purpose | Orchestration Stage |
|---|---|---|
huawei-cloud-ascend-remote-connect | SSH connection | Pre-deploy: Establish connection to DevServer |
huawei-cloud-ascend-command | NPU management | Pre-deploy: Health check; Post-deploy: Monitoring |
Note: No direct calls between Skills. All orchestration by Agent based on user intent (Rule 3).
Prerequisite check: Ascend 910B series required
- Supported: 910B1, 910B2, 910B3, 910B4
- Unsupported: 910A, 310, 310P, etc.
- Check with:
npu-smi info
deploy-large-models.shdeploy-qwen3-vl-model.shdeploy-ai-models.shExtract key information from user natural language and assemble commands accurately.
| Keywords | Operation |
|---|---|
| deploy / start / launch | Single-machine deployment |
| dual-machine / two-node / dual-node | Dual-machine deployment |
| test / inference / call | Test (execute) |
| write command / generate command | Write test command (generate only, no execute) |
| deployment log / view log | View deployment log |
| deployment status / is ready | View deployment status |
| model list / supported models | Show model catalog |
| parameter help / API parameters | Show parameter manual |
Model Name (fuzzy match, case-insensitive, supports card count filter):
Card Count:
Port:
Missing Parameters (check each, prompt what is missing):
Head/Worker IP (dual-machine deployment):
Prompt:
Image URL (VL test):
Multimodal Capability Auto-Detection:
Image URL Conversion (local image → data URL):
# Efficient base64 conversion
IMG_B64=$(base64 -w 0 ${local_image_path})
IMG_URL="data:image/jpeg;base64,${IMG_B64}"
Advanced Parameters (optional):
Thinking Mode: Qwen3/Qwen3.6 models default to thinking mode, outputting reasoning process before final response.
"chat_template_kwargs": {"enable_thinking": false/true}Only Ascend 910B series (910B1 / 910B2 / 910B3 / 910B4). Must check NPU model before deployment, reject non-910B series.
| Model | Min Cards |
|---|---|
| Qwen3-14B | 1 |
| Qwen3-30B-A3B-Instruct-2507 | 2 |
| Qwen3-32B | 2 |
| Qwen3-235B-A22B-Thinking-2507 | 16 |
| Qwen3-235B-A22B-Instruct-2507 | 16 |
| DeepSeek-R1-Distill-Llama-70B | 4 |
| Model | Min Cards |
|---|---|
| Qwen3-VL-30B-A3B-Instruct | 2 |
| Qwen3-VL-32B-Instruct | 2 |
| Qwen3-VL-235B-A22B-Instruct | 16 |
| Qwen3-VL-235B-A22B-Instruct-W8A8 | 8 |
| Model | Min Cards | Multi-card |
|---|---|---|
| Qwen3-Embedding-8B | 1 | No |
| bge-large-zh-v1.5 | 1 | No |
| bge-m3 | 1 | No |
| Model | Min Cards | Multi-card |
|---|---|---|
| Qwen3-Reranker-8B | 1 | No |
| bge-reranker-v2-m3 | 1 | No |
| Model | Min Cards | Capability |
|---|---|---|
| Qwen3.6-35B-A3B | 2 | Text + Image (MoE) |
| Qwen3.6-27B | 2 | Text + Image (MoE) |
| Qwen3-Next-80B-A3B-Instruct | 4 | Large language model |
| DeepSeek-V4-Flash-w8a8-mtp | 8 | Large language model |
Script Path: scripts/deploy_helper.py
Match Rules (hardcoded, 100% accurate):
| Model Category | Deploy Script | Notes |
|---|---|---|
| LLM | deploy-large-models.sh | Shared with Embedding/Rerank |
| Embedding | deploy-large-models.sh | Same as above |
| Rerank | deploy-large-models.sh | Same as above |
| VL | deploy-qwen3-vl-model.sh | Multimodal specific |
| OpenSource | deploy-ai-models.sh | OpenSource specific |
Usage:
# Match model (returns category, script URL, min cards, etc.)
python3 scripts/deploy_helper.py match <model_name>
# Generate deploy command directly
python3 scripts/deploy_helper.py command <model_name> <cards> <port>
# List all models (optional category filter)
python3 scripts/deploy_helper.py list [LLM|VL|Embedding|Rerank|OpenSource]
AI must call deploy_helper.py match first to confirm category and script, then use returned deploy_url to assemble command. Never guess from memory!
Core commands for model deployment and testing. See Operation Flow for detailed steps.
| Command | Description |
|---|---|
deploy <model> <port> | Deploy model on single machine |
deploy <model> <port> <cards> | Deploy with specified card count |
dual-machine deploy <model> head:<IP> worker:<IP> port:<PORT> | Deploy on dual-machine cluster |
test <model> <port> | Test model inference |
deployment log | View deployment log |
deployment status | Check deployment status |
model list | Show supported models |
Check in order, stop if any fails:
huawei-cloud-ascend-command to check chip model, reject non-910B serieshuawei-cloud-ascend-command to check available cards, confirm >= required cardshuawei-cloud-ascend-remote-connect to run ss -tlnp | grep :port, notify if occupiedUser says: "deploy model_name port XXXX" or "deploy model_name port XXXX N cards"
Before deploying, must SSH execute mkdir -p /home/modelarts-agent to ensure directory exists.
LLM / Embedding / Rerank Command Template:
nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/deploy-large-models.sh && chmod 755 /home/modelarts-agent/deploy-large-models.sh && sh /home/modelarts-agent/deploy-large-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &
VL Multimodal Command Template:
nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/single-machine/deploy-qwen3-vl-model.sh && chmod 755 /home/modelarts-agent/deploy-qwen3-vl-model.sh && sh /home/modelarts-agent/deploy-qwen3-vl-model.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &
OpenSource Command Template:
nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/open_source/deploy-ai-models.sh && chmod 755 /home/modelarts-agent/deploy-ai-models.sh && sh /home/modelarts-agent/deploy-ai-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &
User says: "dual-machine deploy model_name head:IP worker:IP port XXXX"
Before dual-machine deploy, both head and worker nodes need mkdir -p /home/modelarts-agent.
Head Node Command Template:
nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &
Worker Node Command Template:
nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &
VL Dual-machine Deployment:
For VL models (Qwen3-VL-235B-A22B-Instruct, etc.), use the following scripts:
VL Head Node Command:
nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &
VL Worker Node Command:
nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &
Sensitive operation, must show full command and wait for user "confirm" before executing.
After deploy command sent:
/home/modelarts-agent/deploy_${model}.logDeployment successful! ${model} is ready
Service URL: http://${IP}:${PORT}/v1/chat/completions
Example request:
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","messages":[{"role":"user","content":"hello"}],"max_tokens":256}'
Multimodal request (if supported):
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"image_url"}},{"type":"text","text":"describe the image"}]}],"max_tokens":512}'
User says: "deployment log model_name"
Agent uses huawei-cloud-ascend-remote-connect to execute:
tail -50 /home/modelarts-agent/deploy_${model}.log
User says: "deployment status port XXXX"
Agent uses huawei-cloud-ascend-remote-connect to execute:
ss -tlnp | grep :
Port listening = Service ready for testing.
User says: "test model_name prompt:xxx" or "test model_name image:URL"
Test flow (strict compliance):
Test Result
| Field | Value |
|-------|-------|
| id | chatcmpl-xxx |
| model | Qwen3-VL-32B-Instruct |
| prompt_tokens | 93 |
| completion_tokens | 400 |
| total_tokens | 493 |
| finish_reason | stop |
Model Response:
[Extract full content, no truncation]
Raw Response:
[Full JSON, no truncation]
curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","messages":[{"role":"user","content":"${prompt}"}],"max_tokens":1024,"temperature":0.7}'
curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":[{"type":"image_url","image_url":{"url":"${image_url}"}},{"type":"text","text":"${prompt}"}]}],"max_tokens":512,"temperature":0.7}'
curl -s -X POST http://${IP}:${PORT}/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","input":"${text}"}'
curl -s -X POST http://${IP}:${PORT}/v1/rerank \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","query":"${query}","documents":["${doc1}","${doc2}"]}'
User says: "write test command model_name prompt:xxx"
Same logic as "test", but only output command text, no execution.
| Parameter | Required | Default | Description |
|---|---|---|---|
| model | Yes | — | Model name, same as deployment |
| messages | Yes | — | Message list, each with role and content |
| max_tokens | No | 16 | Max generation tokens |
| temperature | No | 1.0 | Sampling randomness, 0=greedy |
| top_p | No | 1.0 | Nucleus sampling threshold |
| top_k | No | -1 | Only consider top-K tokens |
| stream | No | false | Streaming output (SSE) |
| chat_template_kwargs | No | {} | Template params, e.g. {"enable_thinking": false} |
| Parameter | Description |
|---|---|
| content[] | Array format: image_url object + text object |
| detail | Image precision: auto/high/low |
| Parameter | Required | Description |
|---|---|---|
| model | Yes | Model name |
| input | Yes | String or string list |
| encoding_format | No | float/base64 |
| Parameter | Required | Description |
|---|---|---|
| model | Yes | Model name |
| query | Yes | Query text |
| documents | Yes | Document list to rerank |
| top_n | No | Return top N |
This skill operates in stateless mode (Rule 2). All context (host, credentials, model info) must be explicitly passed by Agent (Rule 1).
Before calling this skill, Agent MUST:
Establish SSH connection using huawei-cloud-ascend-remote-connect
Check NPU status using huawei-cloud-ascend-command
This skill receives explicit parameters from Agent:
# Model matching (local operation)
python3 scripts/deploy_helper.py match <model_name>
# Script URL generation (local operation)
python3 scripts/deploy_helper.py script <model_name>
# Deploy command generation (local operation)
python3 scripts/deploy_helper.py command <model> <cards> <port>
Agent executes deployment commands on remote server:
# Agent uses SSH to execute deployment on DevServer
ssh root@<host> "cd /path/to/model && bash deploy.sh"
After deployment, Agent calls huawei-cloud-ascend-command to:
User Input Agent This Skill
│ │ │
│ host, password │ │
├─────────────────────────▶│ │
│ │ SSH connect │
│ ├───────────────────────────▶│
│ │ │ (remote-connect)
│ │◀───────────────────────────┤
│ │ │
│ │ NPU check │
│ ├───────────────────────────▶│
│ │ │ (ascend-command)
│ │◀───────────────────────────┤
│ │ │
│ model_name, cards │ │
├─────────────────────────▶│ │
│ │ match model │
│ ├───────────────────────────▶│
│ │ │ deploy_helper.py
│ │◀───────────────────────────┤
│ │ │
│ │ execute deploy │
│ ├───────────────────────────▶│
│ │ │ (via SSH)
│ │◀───────────────────────────┤
│ │ │
│ │ monitor NPU │
│ ├───────────────────────────▶│
│ │ │ (ascend-command)
│ │◀───────────────────────────┤
│ │ │
▼ ▼ ▼
Note: No direct skill-to-skill calls. All orchestration by Agent (Rule 3).
| Document | Description |
|---|---|
| task-deploy-model.md | Deployment task steps |
| task-test-model.md | Testing task steps |
| model-catalog.md | Complete model catalog |
| api-parameters.md | API parameter reference |
| prerequisites.md | Prerequisites checklist |
| verification-method.md | Verification steps |
| troubleshooting.md | Troubleshooting guide |
| scripts/deploy_helper.py | Model matching helper |