huawei-cloud-ascend-models-deploy

MCP Tools

Huawei Cloud Ascend model deployment and testing skill for large language models on Ascend DevServer (910B series). Supports single-machine and dual-machine deployment for LLM, VL (vision-language), Embedding, and Rerank models. Provides model inference testing, deployment log viewing, and status monitoring with automated model matching and deployment script generation. Use this skill when the user wants to: (1) deploy a model on Ascend DevServer, (2) test model inference, (3) view deployment logs or status, (4) list supported models, (5) check deployment prerequisites. Trigger: deploy, test, model list, deployment log, Ascend, DevServer, 910B, ModelArts, LLM, VL, Embedding, Rerank, multimodal, inference, model catalog, 昇腾, 部署模型, 测试模型, 模型列表, 部署日志, 模型部署, 推理测试

Install

openclaw skills install huawei-cloud-ascend-models-deploy

Huawei Cloud Ascend Models Deploy

Deploy and test large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment, model inference testing, and deployment monitoring.

Overview

This skill deploys and tests large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment for LLM, VL, Embedding, and Rerank models.

Related Skills (Agent orchestrated, no direct call, Rule 3):

huawei-cloud-ascend-remote-connect - SSH connection to DevServer (prerequisite for deployment)
huawei-cloud-ascend-command - NPU status check and monitoring (prerequisite and post-deploy monitoring)

Capabilities:

Model deployment (single-node, dual-node)
Inference testing (LLM chat, VL multimodal, Embedding, Rerank)
Deployment log and status monitoring
Model catalog and script auto-matching

Deployment Workflow (Agent orchestrated):

Agent calls huawei-cloud-ascend-remote-connect to establish SSH connection
Agent calls huawei-cloud-ascend-command to check NPU health and availability
Agent calls this skill (huawei-cloud-ascend-models-deploy) to deploy model
Agent calls huawei-cloud-ascend-command to monitor NPU status during deployment

Architecture

System Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                         Agent Orchestration                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  1. SSH connect (remote-connect)                             │    │
│  │  2. NPU health check (ascend-command)                        │    │
│  │  3. Deploy model (this skill)                                 │    │
│  │  4. Monitor NPU (ascend-command)                             │    │
│  └────────────────────────────┬────────────────────────────────┘    │
│                               │ Explicit param passing (Rule 1)    │
│                               ▼                                     │
├─────────────────────────────────────────────────────────────────────┤
│              Huawei Cloud Ascend Models Deploy                      │
│                      (Stateless, Rule 2)                            │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐    ┌──────────────────────────────────┐      │
│  │  Natural Language│    │          Deploy Helper           │      │
│  │     Commands     │───▶│  - Model Matching & Catalog      │      │
│  └──────────────────┘    │  - Script Auto-Match             │      │
│                          │  - Command Generation            │      │
│                          └──────────────────────────────────┘      │
│                                           │                         │
│          ┌─────────────────────────────────┼──────────────┐        │
│          ▼                                 ▼              ▼        │
│  ┌───────────────┐              ┌─────────────────┐ ┌────────┐    │
│  │ Model         │              │ Inference       │ │ Log    │    │
│  │ Deployment    │              │ Testing         │ │ Status │    │
│  │               │              │                 │ │        │    │
│  │ • Single-node │              │ • LLM Chat      │ │ • View │    │
│  │ • Dual-node   │              │ • VL Multimodal │ │ • Check│    │
│  │ • 910B Series │              │ • Embedding     │ │        │    │
│  └───────────────┘              │ • Rerank        │ └────────┘    │
│                                 └─────────────────┘               │
└─────────────────────────────────────────────────────────────────────┘

Agent Orchestration Flow

User request: "Deploy Qwen2.5-72B on DevServer 116.204.23.145"
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 1: SSH Connection                                 │
│   → Call huawei-cloud-ascend-remote-connect                  │
│   → Pass: host, user, password (explicit, Rule 1)            │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 2: NPU Health Check                               │
│   → Call huawei-cloud-ascend-command                         │
│   → Check: NPU list, health, HBM availability                │
│   → Fail if NPU not healthy or insufficient HBM              │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 3: Deploy Model (this skill)                      │
│   → Match model from catalog                                 │
│   → Generate deploy script                                   │
│   → Execute deployment                                        │
│   → Stateless execution (Rule 2)                             │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 4: Monitor NPU                                    │
│   → Call huawei-cloud-ascend-command                         │
│   → Monitor: HBM usage, temperature, processes               │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
      Deployment Complete

Related Skills Table

Skill	Purpose	Orchestration Stage
`huawei-cloud-ascend-remote-connect`	SSH connection	Pre-deploy: Establish connection to DevServer
`huawei-cloud-ascend-command`	NPU management	Pre-deploy: Health check; Post-deploy: Monitoring

Note: No direct calls between Skills. All orchestration by Agent based on user intent (Rule 3).

Prerequisites

Prerequisite check: Ascend 910B series required

Supported: 910B1, 910B2, 910B3, 910B4

Unsupported: 910A, 310, 310P, etc.

Check with: npu-smi info

Mandatory Rules (AI Must Follow)

Never guess commands from memory — Must read "Deploy Script Auto-Match" section
Must call deploy_helper.py first — Confirm model category and script URL
Different models use different scripts:
- LLM / Embedding / Rerank → deploy-large-models.sh
- VL → deploy-qwen3-vl-model.sh
- OpenSource → deploy-ai-models.sh
Must validate before deployment — Port, NPU, model, card count
Show command and wait for confirmation — Sensitive operation, never execute directly

Natural Language Understanding Rules

Extract key information from user natural language and assemble commands accurately.

Operation Type Detection

Keywords	Operation
deploy / start / launch	Single-machine deployment
dual-machine / two-node / dual-node	Dual-machine deployment
test / inference / call	Test (execute)
write command / generate command	Write test command (generate only, no execute)
deployment log / view log	View deployment log
deployment status / is ready	View deployment status
model list / supported models	Show model catalog
parameter help / API parameters	Show parameter manual

Information Extraction Rules

Model Name (fuzzy match, case-insensitive, supports card count filter):

"qwen3-14b" → Qwen3-14B
"qwen3-235b" → Multiple matches, prefer Instruct version (Qwen3-235B-A22B-Instruct-2507), or ask user
"vl-32b" → Qwen3-VL-32B-Instruct
"bge-m3" → bge-m3
"qwen3-vl" + 2 cards → Match VL models with ≤2 cards, list for user to choose
"qwen3" + 2 cards → Match all Qwen3 models with ≤2 cards, list for user to choose
Multiple candidates → List all candidates (with card count and category), let user confirm
No match → Show full model catalog for user to select

Card Count:

"2 cards" / "use 2 cards" / "2 npus" → 2
"16 cards" / "16 npus" → 16
"dual-machine" → 16
Not specified → Use minimum card count from model catalog

Port:

"port 8022" / "port:8022" → 8022
Not specified → Default 8080

Missing Parameters (check each, prompt what is missing):

Missing model name → "Please specify model name" + show model list
Missing card count → "Please specify card count, e.g.: 2 cards" + show minimum cards for this model
Missing port → "Please specify port (default 8080), e.g.: port 8001"
Dual-machine missing head IP → "Please specify head node IP, e.g.: head:192.168.1.1"
Dual-machine missing worker IP → "Please specify worker node IP, e.g.: worker:192.168.1.2"

Head/Worker IP (dual-machine deployment):

"head:1.1.1.1" / "head node 1.1.1.1" → Head node IP
"worker:2.2.2.2" / "worker node 2.2.2.2" → Worker node IP

Prompt:

"prompt:hello" / "ask:hello" → Prompt text
Not specified → LLM default "hello", VL default "describe the image", Embedding default "I love shanghai", Rerank default "What is the capital of France?"

Image URL (VL test):

"image:https://xxx.jpg" / direct URL → Image URL
User sends image attachment → Auto-convert to base64 data URL
Not specified and testing multimodal model → Prompt user for image URL

Multimodal Capability Auto-Detection:

VL category → Supports multimodal
OpenSource: Qwen3.6-35B-A3B, Qwen3.6-27B → Supports multimodal
LLM category → Text only
Embedding → Text only
Rerank → Text only

Image URL Conversion (local image → data URL):

# Efficient base64 conversion
IMG_B64=$(base64 -w 0 ${local_image_path})
IMG_URL="data:image/jpeg;base64,${IMG_B64}"

Advanced Parameters (optional):

"max_tokens:64" → max_tokens=64
"temperature:0.7" → temperature=0.7
"stream" → stream=true
"system:You are assistant" → system_prompt
"disable thinking" / "no thinking" → chat_template_kwargs: {"enable_thinking": false}
(Default = thinking mode enabled)

Thinking Mode: Qwen3/Qwen3.6 models default to thinking mode, outputting reasoning process before final response.

Enable thinking: Higher quality, more token consumption
Disable thinking: Direct output, less token consumption, suitable for simple queries
Request-level control via "chat_template_kwargs": {"enable_thinking": false/true}

Supported Machine Types

Only Ascend 910B series (910B1 / 910B2 / 910B3 / 910B4). Must check NPU model before deployment, reject non-910B series.

Model Catalog

Large Language Models (LLM) — Endpoint: /v1/chat/completions

Model	Min Cards
Qwen3-14B	1
Qwen3-30B-A3B-Instruct-2507	2
Qwen3-32B	2
Qwen3-235B-A22B-Thinking-2507	16
Qwen3-235B-A22B-Instruct-2507	16
DeepSeek-R1-Distill-Llama-70B	4

Vision-Language (VL) — Endpoint: /v1/chat/completions

Model	Min Cards
Qwen3-VL-30B-A3B-Instruct	2
Qwen3-VL-32B-Instruct	2
Qwen3-VL-235B-A22B-Instruct	16
Qwen3-VL-235B-A22B-Instruct-W8A8	8

Embedding — Endpoint: /v1/embeddings (V0 backend only, single card only)

Model	Min Cards	Multi-card
Qwen3-Embedding-8B	1	No
bge-large-zh-v1.5	1	No
bge-m3	1	No

Rerank — Endpoint: /v1/rerank (single card only)

Model	Min Cards	Multi-card
Qwen3-Reranker-8B	1	No
bge-reranker-v2-m3	1	No

OpenSource (Multimodal)

Model	Min Cards	Capability
Qwen3.6-35B-A3B	2	Text + Image (MoE)
Qwen3.6-27B	2	Text + Image (MoE)
Qwen3-Next-80B-A3B-Instruct	4	Large language model
DeepSeek-V4-Flash-w8a8-mtp	8	Large language model

Deploy Script Auto-Match (Must use, never guess script URL)

Script Path: scripts/deploy_helper.py

Match Rules (hardcoded, 100% accurate):

Model Category	Deploy Script	Notes
LLM	`deploy-large-models.sh`	Shared with Embedding/Rerank
Embedding	`deploy-large-models.sh`	Same as above
Rerank	`deploy-large-models.sh`	Same as above
VL	`deploy-qwen3-vl-model.sh`	Multimodal specific
OpenSource	`deploy-ai-models.sh`	OpenSource specific

Usage:

# Match model (returns category, script URL, min cards, etc.)
python3 scripts/deploy_helper.py match <model_name>

# Generate deploy command directly
python3 scripts/deploy_helper.py command <model_name> <cards> <port>

# List all models (optional category filter)
python3 scripts/deploy_helper.py list [LLM|VL|Embedding|Rerank|OpenSource]

AI must call deploy_helper.py match first to confirm category and script, then use returned deploy_url to assemble command. Never guess from memory!

Core Commands

Core commands for model deployment and testing. See Operation Flow for detailed steps.

Command	Description
`deploy <model> <port>`	Deploy model on single machine
`deploy <model> <port> <cards>`	Deploy with specified card count
`dual-machine deploy <model> head:<IP> worker:<IP> port:<PORT>`	Deploy on dual-machine cluster
`test <model> <port>`	Test model inference
`deployment log`	View deployment log
`deployment status`	Check deployment status
`model list`	Show supported models

Operation Flow

I. Deployment

1. Pre-deployment Check (Must execute every time, cannot skip)

Check in order, stop if any fails:

NPU Model Check — Agent calls huawei-cloud-ascend-command to check chip model, reject non-910B series
NPU Card Count Check — Agent calls huawei-cloud-ascend-command to check available cards, confirm >= required cards
User Card Count Check — User-specified cards must be >= minimum and within supported range (1,2,4,8,16)
Embedding/Rerank Single Card Check — Embedding and Rerank only support single card, reject multi-card
Port Occupancy Check — Agent calls huawei-cloud-ascend-remote-connect to run ss -tlnp | grep :port, notify if occupied
SSH Connectivity Check — For dual-machine, verify both head and worker nodes are SSH accessible

2. Single-machine Deployment

User says: "deploy model_name port XXXX" or "deploy model_name port XXXX N cards"

Before deploying, must SSH execute mkdir -p /home/modelarts-agent to ensure directory exists.

LLM / Embedding / Rerank Command Template:

nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/deploy-large-models.sh && chmod 755 /home/modelarts-agent/deploy-large-models.sh && sh /home/modelarts-agent/deploy-large-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &

VL Multimodal Command Template:

nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/single-machine/deploy-qwen3-vl-model.sh && chmod 755 /home/modelarts-agent/deploy-qwen3-vl-model.sh && sh /home/modelarts-agent/deploy-qwen3-vl-model.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &

OpenSource Command Template:

nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/open_source/deploy-ai-models.sh && chmod 755 /home/modelarts-agent/deploy-ai-models.sh && sh /home/modelarts-agent/deploy-ai-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &

3. Dual-machine Deployment

User says: "dual-machine deploy model_name head:IP worker:IP port XXXX"

Before dual-machine deploy, both head and worker nodes need mkdir -p /home/modelarts-agent.

Head Node Command Template:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &

Worker Node Command Template:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &

VL Dual-machine Deployment:

For VL models (Qwen3-VL-235B-A22B-Instruct, etc.), use the following scripts:

VL Head Node Command:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &

VL Worker Node Command:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &

4. Deployment Confirmation Flow

Sensitive operation, must show full command and wait for user "confirm" before executing.

After deploy command sent:

Notify user: Ready, starting deployment of ${model}, log at /home/modelarts-agent/deploy_${model}.log
Check log every 2 minutes, report progress (loading weights, Dynamo compiling, service starting, etc.)
When port is listening, notify deployment success
Deployment failure handling (strict compliance):
- Deployment failed = Report failure reason, no automatic retry
- Never auto-change image and retry
- Never auto-modify parameters and retry
- Never try other deployment methods
- Only report error, let user decide next step
Output API sample for user:

Deployment successful! ${model} is ready

Service URL: http://${IP}:${PORT}/v1/chat/completions

Example request:
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"user","content":"hello"}],"max_tokens":256}'

Multimodal request (if supported):
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"image_url"}},{"type":"text","text":"describe the image"}]}],"max_tokens":512}'

II. Deployment Log

User says: "deployment log model_name"

Agent uses huawei-cloud-ascend-remote-connect to execute:

tail -50 /home/modelarts-agent/deploy_${model}.log

III. Deployment Status

User says: "deployment status port XXXX"

Agent uses huawei-cloud-ascend-remote-connect to execute:

ss -tlnp | grep :

Port listening = Service ready for testing.

IV. Test (Execute)

User says: "test model_name prompt:xxx" or "test model_name image:URL"

Test flow (strict compliance):

Show full curl command for user to review
Wait for user "confirm" or "send" before executing
Structured result output:

Test Result

| Field | Value |
|-------|-------|
| id | chatcmpl-xxx |
| model | Qwen3-VL-32B-Instruct |
| prompt_tokens | 93 |
| completion_tokens | 400 |
| total_tokens | 493 |
| finish_reason | stop |

Model Response:
[Extract full content, no truncation]

Raw Response:
[Full JSON, no truncation]

LLM Chat Completions

curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"user","content":"${prompt}"}],"max_tokens":1024,"temperature":0.7}'

Multimodal VL

curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":[{"type":"image_url","image_url":{"url":"${image_url}"}},{"type":"text","text":"${prompt}"}]}],"max_tokens":512,"temperature":0.7}'

Embedding

curl -s -X POST http://${IP}:${PORT}/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","input":"${text}"}'

Rerank

curl -s -X POST http://${IP}:${PORT}/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","query":"${query}","documents":["${doc1}","${doc2}"]}'

V. Write Test Command (Generate Only)

User says: "write test command model_name prompt:xxx"

Same logic as "test", but only output command text, no execution.

API Parameter Manual

LLM Parameters (/v1/chat/completions)

Parameter	Required	Default	Description
model	Yes	—	Model name, same as deployment
messages	Yes	—	Message list, each with role and content
max_tokens	No	16	Max generation tokens
temperature	No	1.0	Sampling randomness, 0=greedy
top_p	No	1.0	Nucleus sampling threshold
top_k	No	-1	Only consider top-K tokens
stream	No	false	Streaming output (SSE)
chat_template_kwargs	No	{}	Template params, e.g. {"enable_thinking": false}

VL Extra Parameters

Parameter	Description
content[]	Array format: image_url object + text object
detail	Image precision: auto/high/low

Embedding Parameters (/v1/embeddings)

Parameter	Required	Description
model	Yes	Model name
input	Yes	String or string list
encoding_format	No	float/base64

Rerank Parameters (/v1/rerank)

Parameter	Required	Description
model	Yes	Model name
query	Yes	Query text
documents	Yes	Document list to rerank
top_n	No	Return top N

Execution Mode

This skill operates in stateless mode (Rule 2). All context (host, credentials, model info) must be explicitly passed by Agent (Rule 1).

Prerequisites (Agent orchestrated)

Before calling this skill, Agent MUST:

Establish SSH connection using huawei-cloud-ascend-remote-connect
- Agent receives: host, port, user, password from user
- Agent validates connection is successful
Check NPU status using huawei-cloud-ascend-command
- Agent checks: NPU health, HBM availability
- Agent validates: sufficient cards for model deployment

Skill Execution

This skill receives explicit parameters from Agent:

# Model matching (local operation)
python3 scripts/deploy_helper.py match <model_name>

# Script URL generation (local operation)
python3 scripts/deploy_helper.py script <model_name>

# Deploy command generation (local operation)
python3 scripts/deploy_helper.py command <model> <cards> <port>

Remote Deployment Execution

Agent executes deployment commands on remote server:

# Agent uses SSH to execute deployment on DevServer
ssh root@<host> "cd /path/to/model && bash deploy.sh"

Post-Deployment (Agent orchestrated)

After deployment, Agent calls huawei-cloud-ascend-command to:

Monitor NPU HBM usage
Check deployment process status
Verify model endpoint is responding

Parameter Flow

User Input                    Agent                      This Skill
    │                          │                            │
    │ host, password           │                            │
    ├─────────────────────────▶│                            │
    │                          │ SSH connect                │
    │                          ├───────────────────────────▶│
    │                          │                            │ (remote-connect)
    │                          │◀───────────────────────────┤
    │                          │                            │
    │                          │ NPU check                  │
    │                          ├───────────────────────────▶│
    │                          │                            │ (ascend-command)
    │                          │◀───────────────────────────┤
    │                          │                            │
    │ model_name, cards        │                            │
    ├─────────────────────────▶│                            │
    │                          │ match model                │
    │                          ├───────────────────────────▶│
    │                          │                            │ deploy_helper.py
    │                          │◀───────────────────────────┤
    │                          │                            │
    │                          │ execute deploy             │
    │                          ├───────────────────────────▶│
    │                          │                            │ (via SSH)
    │                          │◀───────────────────────────┤
    │                          │                            │
    │                          │ monitor NPU                │
    │                          ├───────────────────────────▶│
    │                          │                            │ (ascend-command)
    │                          │◀───────────────────────────┤
    │                          │                            │
    ▼                          ▼                            ▼

Note: No direct skill-to-skill calls. All orchestration by Agent (Rule 3).

References

Document	Description
task-deploy-model.md	Deployment task steps
task-test-model.md	Testing task steps
model-catalog.md	Complete model catalog
api-parameters.md	API parameter reference
prerequisites.md	Prerequisites checklist
verification-method.md	Verification steps
troubleshooting.md	Troubleshooting guide
scripts/deploy_helper.py	Model matching helper