huawei-cloud-ascend-models-deploy

MCP Tools

Huawei Cloud Ascend model deployment and testing skill for large language models on Ascend DevServer (910B series). Supports single-machine and dual-machine deployment for LLM, VL (vision-language), Embedding, and Rerank models. Provides model inference testing, deployment log viewing, and status monitoring with automated model matching and deployment script generation. Use this skill when the user wants to: (1) deploy a model on Ascend DevServer, (2) test model inference, (3) view deployment logs or status, (4) list supported models, (5) check deployment prerequisites. Trigger: deploy, test, model list, deployment log, Ascend, DevServer, 910B, ModelArts, LLM, VL, Embedding, Rerank, multimodal, inference, model catalog, 昇腾, 部署模型, 测试模型, 模型列表, 部署日志, 模型部署, 推理测试

Install

openclaw skills install huawei-cloud-ascend-models-deploy

Huawei Cloud Ascend Models Deploy

Deploy and test large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment, model inference testing, and deployment monitoring.

Overview

This skill deploys and tests large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment for LLM, VL, Embedding, and Rerank models.

Related Skills (Agent orchestrated, no direct call, Rule 3):

  • huawei-cloud-ascend-remote-connect - SSH connection to DevServer (prerequisite for deployment)
  • huawei-cloud-ascend-command - NPU status check and monitoring (prerequisite and post-deploy monitoring)

Capabilities:

  • Model deployment (single-node, dual-node)
  • Inference testing (LLM chat, VL multimodal, Embedding, Rerank)
  • Deployment log and status monitoring
  • Model catalog and script auto-matching

Deployment Workflow (Agent orchestrated):

  1. Agent calls huawei-cloud-ascend-remote-connect to establish SSH connection
  2. Agent calls huawei-cloud-ascend-command to check NPU health and availability
  3. Agent calls this skill (huawei-cloud-ascend-models-deploy) to deploy model
  4. Agent calls huawei-cloud-ascend-command to monitor NPU status during deployment

Architecture

System Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                         Agent Orchestration                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  1. SSH connect (remote-connect)                             │    │
│  │  2. NPU health check (ascend-command)                        │    │
│  │  3. Deploy model (this skill)                                 │    │
│  │  4. Monitor NPU (ascend-command)                             │    │
│  └────────────────────────────┬────────────────────────────────┘    │
│                               │ Explicit param passing (Rule 1)    │
│                               ▼                                     │
├─────────────────────────────────────────────────────────────────────┤
│              Huawei Cloud Ascend Models Deploy                      │
│                      (Stateless, Rule 2)                            │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐    ┌──────────────────────────────────┐      │
│  │  Natural Language│    │          Deploy Helper           │      │
│  │     Commands     │───▶│  - Model Matching & Catalog      │      │
│  └──────────────────┘    │  - Script Auto-Match             │      │
│                          │  - Command Generation            │      │
│                          └──────────────────────────────────┘      │
│                                           │                         │
│          ┌─────────────────────────────────┼──────────────┐        │
│          ▼                                 ▼              ▼        │
│  ┌───────────────┐              ┌─────────────────┐ ┌────────┐    │
│  │ Model         │              │ Inference       │ │ Log    │    │
│  │ Deployment    │              │ Testing         │ │ Status │    │
│  │               │              │                 │ │        │    │
│  │ • Single-node │              │ • LLM Chat      │ │ • View │    │
│  │ • Dual-node   │              │ • VL Multimodal │ │ • Check│    │
│  │ • 910B Series │              │ • Embedding     │ │        │    │
│  └───────────────┘              │ • Rerank        │ └────────┘    │
│                                 └─────────────────┘               │
└─────────────────────────────────────────────────────────────────────┘

Agent Orchestration Flow

User request: "Deploy Qwen2.5-72B on DevServer 116.204.23.145"
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 1: SSH Connection                                 │
│   → Call huawei-cloud-ascend-remote-connect                  │
│   → Pass: host, user, password (explicit, Rule 1)            │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 2: NPU Health Check                               │
│   → Call huawei-cloud-ascend-command                         │
│   → Check: NPU list, health, HBM availability                │
│   → Fail if NPU not healthy or insufficient HBM              │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 3: Deploy Model (this skill)                      │
│   → Match model from catalog                                 │
│   → Generate deploy script                                   │
│   → Execute deployment                                        │
│   → Stateless execution (Rule 2)                             │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 4: Monitor NPU                                    │
│   → Call huawei-cloud-ascend-command                         │
│   → Monitor: HBM usage, temperature, processes               │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
      Deployment Complete

Related Skills Table

SkillPurposeOrchestration Stage
huawei-cloud-ascend-remote-connectSSH connectionPre-deploy: Establish connection to DevServer
huawei-cloud-ascend-commandNPU managementPre-deploy: Health check; Post-deploy: Monitoring

Note: No direct calls between Skills. All orchestration by Agent based on user intent (Rule 3).

Prerequisites

Prerequisite check: Ascend 910B series required

  • Supported: 910B1, 910B2, 910B3, 910B4
  • Unsupported: 910A, 310, 310P, etc.
  • Check with: npu-smi info

Mandatory Rules (AI Must Follow)

  1. Never guess commands from memory — Must read "Deploy Script Auto-Match" section
  2. Must call deploy_helper.py first — Confirm model category and script URL
  3. Different models use different scripts:
    • LLM / Embedding / Rerank → deploy-large-models.sh
    • VL → deploy-qwen3-vl-model.sh
    • OpenSource → deploy-ai-models.sh
  4. Must validate before deployment — Port, NPU, model, card count
  5. Show command and wait for confirmation — Sensitive operation, never execute directly

Natural Language Understanding Rules

Extract key information from user natural language and assemble commands accurately.

Operation Type Detection

KeywordsOperation
deploy / start / launchSingle-machine deployment
dual-machine / two-node / dual-nodeDual-machine deployment
test / inference / callTest (execute)
write command / generate commandWrite test command (generate only, no execute)
deployment log / view logView deployment log
deployment status / is readyView deployment status
model list / supported modelsShow model catalog
parameter help / API parametersShow parameter manual

Information Extraction Rules

Model Name (fuzzy match, case-insensitive, supports card count filter):

  • "qwen3-14b" → Qwen3-14B
  • "qwen3-235b" → Multiple matches, prefer Instruct version (Qwen3-235B-A22B-Instruct-2507), or ask user
  • "vl-32b" → Qwen3-VL-32B-Instruct
  • "bge-m3" → bge-m3
  • "qwen3-vl" + 2 cards → Match VL models with ≤2 cards, list for user to choose
  • "qwen3" + 2 cards → Match all Qwen3 models with ≤2 cards, list for user to choose
  • Multiple candidates → List all candidates (with card count and category), let user confirm
  • No match → Show full model catalog for user to select

Card Count:

  • "2 cards" / "use 2 cards" / "2 npus" → 2
  • "16 cards" / "16 npus" → 16
  • "dual-machine" → 16
  • Not specified → Use minimum card count from model catalog

Port:

  • "port 8022" / "port:8022" → 8022
  • Not specified → Default 8080

Missing Parameters (check each, prompt what is missing):

  • Missing model name → "Please specify model name" + show model list
  • Missing card count → "Please specify card count, e.g.: 2 cards" + show minimum cards for this model
  • Missing port → "Please specify port (default 8080), e.g.: port 8001"
  • Dual-machine missing head IP → "Please specify head node IP, e.g.: head:192.168.1.1"
  • Dual-machine missing worker IP → "Please specify worker node IP, e.g.: worker:192.168.1.2"

Head/Worker IP (dual-machine deployment):

  • "head:1.1.1.1" / "head node 1.1.1.1" → Head node IP
  • "worker:2.2.2.2" / "worker node 2.2.2.2" → Worker node IP

Prompt:

  • "prompt:hello" / "ask:hello" → Prompt text
  • Not specified → LLM default "hello", VL default "describe the image", Embedding default "I love shanghai", Rerank default "What is the capital of France?"

Image URL (VL test):

  • "image:https://xxx.jpg" / direct URL → Image URL
  • User sends image attachment → Auto-convert to base64 data URL
  • Not specified and testing multimodal model → Prompt user for image URL

Multimodal Capability Auto-Detection:

  • VL category → Supports multimodal
  • OpenSource: Qwen3.6-35B-A3B, Qwen3.6-27B → Supports multimodal
  • LLM category → Text only
  • Embedding → Text only
  • Rerank → Text only

Image URL Conversion (local image → data URL):

# Efficient base64 conversion
IMG_B64=$(base64 -w 0 ${local_image_path})
IMG_URL="data:image/jpeg;base64,${IMG_B64}"

Advanced Parameters (optional):

  • "max_tokens:64" → max_tokens=64
  • "temperature:0.7" → temperature=0.7
  • "stream" → stream=true
  • "system:You are assistant" → system_prompt
  • "disable thinking" / "no thinking" → chat_template_kwargs: {"enable_thinking": false}
  • (Default = thinking mode enabled)

Thinking Mode: Qwen3/Qwen3.6 models default to thinking mode, outputting reasoning process before final response.

  • Enable thinking: Higher quality, more token consumption
  • Disable thinking: Direct output, less token consumption, suitable for simple queries
  • Request-level control via "chat_template_kwargs": {"enable_thinking": false/true}

Supported Machine Types

Only Ascend 910B series (910B1 / 910B2 / 910B3 / 910B4). Must check NPU model before deployment, reject non-910B series.


Model Catalog

Large Language Models (LLM) — Endpoint: /v1/chat/completions

ModelMin Cards
Qwen3-14B1
Qwen3-30B-A3B-Instruct-25072
Qwen3-32B2
Qwen3-235B-A22B-Thinking-250716
Qwen3-235B-A22B-Instruct-250716
DeepSeek-R1-Distill-Llama-70B4

Vision-Language (VL) — Endpoint: /v1/chat/completions

ModelMin Cards
Qwen3-VL-30B-A3B-Instruct2
Qwen3-VL-32B-Instruct2
Qwen3-VL-235B-A22B-Instruct16
Qwen3-VL-235B-A22B-Instruct-W8A88

Embedding — Endpoint: /v1/embeddings (V0 backend only, single card only)

ModelMin CardsMulti-card
Qwen3-Embedding-8B1No
bge-large-zh-v1.51No
bge-m31No

Rerank — Endpoint: /v1/rerank (single card only)

ModelMin CardsMulti-card
Qwen3-Reranker-8B1No
bge-reranker-v2-m31No

OpenSource (Multimodal)

ModelMin CardsCapability
Qwen3.6-35B-A3B2Text + Image (MoE)
Qwen3.6-27B2Text + Image (MoE)
Qwen3-Next-80B-A3B-Instruct4Large language model
DeepSeek-V4-Flash-w8a8-mtp8Large language model

Deploy Script Auto-Match (Must use, never guess script URL)

Script Path: scripts/deploy_helper.py

Match Rules (hardcoded, 100% accurate):

Model CategoryDeploy ScriptNotes
LLMdeploy-large-models.shShared with Embedding/Rerank
Embeddingdeploy-large-models.shSame as above
Rerankdeploy-large-models.shSame as above
VLdeploy-qwen3-vl-model.shMultimodal specific
OpenSourcedeploy-ai-models.shOpenSource specific

Usage:

# Match model (returns category, script URL, min cards, etc.)
python3 scripts/deploy_helper.py match <model_name>

# Generate deploy command directly
python3 scripts/deploy_helper.py command <model_name> <cards> <port>

# List all models (optional category filter)
python3 scripts/deploy_helper.py list [LLM|VL|Embedding|Rerank|OpenSource]

AI must call deploy_helper.py match first to confirm category and script, then use returned deploy_url to assemble command. Never guess from memory!


Core Commands

Core commands for model deployment and testing. See Operation Flow for detailed steps.

CommandDescription
deploy <model> <port>Deploy model on single machine
deploy <model> <port> <cards>Deploy with specified card count
dual-machine deploy <model> head:<IP> worker:<IP> port:<PORT>Deploy on dual-machine cluster
test <model> <port>Test model inference
deployment logView deployment log
deployment statusCheck deployment status
model listShow supported models

Operation Flow

I. Deployment

1. Pre-deployment Check (Must execute every time, cannot skip)

Check in order, stop if any fails:

  1. NPU Model Check — Agent calls huawei-cloud-ascend-command to check chip model, reject non-910B series
  2. NPU Card Count Check — Agent calls huawei-cloud-ascend-command to check available cards, confirm >= required cards
  3. User Card Count Check — User-specified cards must be >= minimum and within supported range (1,2,4,8,16)
  4. Embedding/Rerank Single Card Check — Embedding and Rerank only support single card, reject multi-card
  5. Port Occupancy Check — Agent calls huawei-cloud-ascend-remote-connect to run ss -tlnp | grep :port, notify if occupied
  6. SSH Connectivity Check — For dual-machine, verify both head and worker nodes are SSH accessible

2. Single-machine Deployment

User says: "deploy model_name port XXXX" or "deploy model_name port XXXX N cards"

Before deploying, must SSH execute mkdir -p /home/modelarts-agent to ensure directory exists.

LLM / Embedding / Rerank Command Template:

nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/deploy-large-models.sh && chmod 755 /home/modelarts-agent/deploy-large-models.sh && sh /home/modelarts-agent/deploy-large-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &

VL Multimodal Command Template:

nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/single-machine/deploy-qwen3-vl-model.sh && chmod 755 /home/modelarts-agent/deploy-qwen3-vl-model.sh && sh /home/modelarts-agent/deploy-qwen3-vl-model.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &

OpenSource Command Template:

nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/open_source/deploy-ai-models.sh && chmod 755 /home/modelarts-agent/deploy-ai-models.sh && sh /home/modelarts-agent/deploy-ai-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &

3. Dual-machine Deployment

User says: "dual-machine deploy model_name head:IP worker:IP port XXXX"

Before dual-machine deploy, both head and worker nodes need mkdir -p /home/modelarts-agent.

Head Node Command Template:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &

Worker Node Command Template:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &

VL Dual-machine Deployment:

For VL models (Qwen3-VL-235B-A22B-Instruct, etc.), use the following scripts:

VL Head Node Command:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &

VL Worker Node Command:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &

4. Deployment Confirmation Flow

Sensitive operation, must show full command and wait for user "confirm" before executing.

After deploy command sent:

  1. Notify user: Ready, starting deployment of ${model}, log at /home/modelarts-agent/deploy_${model}.log
  2. Check log every 2 minutes, report progress (loading weights, Dynamo compiling, service starting, etc.)
  3. When port is listening, notify deployment success
  4. Deployment failure handling (strict compliance):
    • Deployment failed = Report failure reason, no automatic retry
    • Never auto-change image and retry
    • Never auto-modify parameters and retry
    • Never try other deployment methods
    • Only report error, let user decide next step
  5. Output API sample for user:
Deployment successful! ${model} is ready

Service URL: http://${IP}:${PORT}/v1/chat/completions

Example request:
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"user","content":"hello"}],"max_tokens":256}'

Multimodal request (if supported):
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"image_url"}},{"type":"text","text":"describe the image"}]}],"max_tokens":512}'

II. Deployment Log

User says: "deployment log model_name"

Agent uses huawei-cloud-ascend-remote-connect to execute:

tail -50 /home/modelarts-agent/deploy_${model}.log

III. Deployment Status

User says: "deployment status port XXXX"

Agent uses huawei-cloud-ascend-remote-connect to execute:

ss -tlnp | grep :

Port listening = Service ready for testing.


IV. Test (Execute)

User says: "test model_name prompt:xxx" or "test model_name image:URL"

Test flow (strict compliance):

  1. Show full curl command for user to review
  2. Wait for user "confirm" or "send" before executing
  3. Structured result output:
Test Result

| Field | Value |
|-------|-------|
| id | chatcmpl-xxx |
| model | Qwen3-VL-32B-Instruct |
| prompt_tokens | 93 |
| completion_tokens | 400 |
| total_tokens | 493 |
| finish_reason | stop |

Model Response:
[Extract full content, no truncation]

Raw Response:
[Full JSON, no truncation]

LLM Chat Completions

curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"user","content":"${prompt}"}],"max_tokens":1024,"temperature":0.7}'

Multimodal VL

curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":[{"type":"image_url","image_url":{"url":"${image_url}"}},{"type":"text","text":"${prompt}"}]}],"max_tokens":512,"temperature":0.7}'

Embedding

curl -s -X POST http://${IP}:${PORT}/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","input":"${text}"}'

Rerank

curl -s -X POST http://${IP}:${PORT}/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","query":"${query}","documents":["${doc1}","${doc2}"]}'

V. Write Test Command (Generate Only)

User says: "write test command model_name prompt:xxx"

Same logic as "test", but only output command text, no execution.


API Parameter Manual

LLM Parameters (/v1/chat/completions)

ParameterRequiredDefaultDescription
modelYesModel name, same as deployment
messagesYesMessage list, each with role and content
max_tokensNo16Max generation tokens
temperatureNo1.0Sampling randomness, 0=greedy
top_pNo1.0Nucleus sampling threshold
top_kNo-1Only consider top-K tokens
streamNofalseStreaming output (SSE)
chat_template_kwargsNo{}Template params, e.g. {"enable_thinking": false}

VL Extra Parameters

ParameterDescription
content[]Array format: image_url object + text object
detailImage precision: auto/high/low

Embedding Parameters (/v1/embeddings)

ParameterRequiredDescription
modelYesModel name
inputYesString or string list
encoding_formatNofloat/base64

Rerank Parameters (/v1/rerank)

ParameterRequiredDescription
modelYesModel name
queryYesQuery text
documentsYesDocument list to rerank
top_nNoReturn top N

Execution Mode

This skill operates in stateless mode (Rule 2). All context (host, credentials, model info) must be explicitly passed by Agent (Rule 1).

Prerequisites (Agent orchestrated)

Before calling this skill, Agent MUST:

  1. Establish SSH connection using huawei-cloud-ascend-remote-connect

    • Agent receives: host, port, user, password from user
    • Agent validates connection is successful
  2. Check NPU status using huawei-cloud-ascend-command

    • Agent checks: NPU health, HBM availability
    • Agent validates: sufficient cards for model deployment

Skill Execution

This skill receives explicit parameters from Agent:

# Model matching (local operation)
python3 scripts/deploy_helper.py match <model_name>

# Script URL generation (local operation)
python3 scripts/deploy_helper.py script <model_name>

# Deploy command generation (local operation)
python3 scripts/deploy_helper.py command <model> <cards> <port>

Remote Deployment Execution

Agent executes deployment commands on remote server:

# Agent uses SSH to execute deployment on DevServer
ssh root@<host> "cd /path/to/model && bash deploy.sh"

Post-Deployment (Agent orchestrated)

After deployment, Agent calls huawei-cloud-ascend-command to:

  • Monitor NPU HBM usage
  • Check deployment process status
  • Verify model endpoint is responding

Parameter Flow

User Input                    Agent                      This Skill
    │                          │                            │
    │ host, password           │                            │
    ├─────────────────────────▶│                            │
    │                          │ SSH connect                │
    │                          ├───────────────────────────▶│
    │                          │                            │ (remote-connect)
    │                          │◀───────────────────────────┤
    │                          │                            │
    │                          │ NPU check                  │
    │                          ├───────────────────────────▶│
    │                          │                            │ (ascend-command)
    │                          │◀───────────────────────────┤
    │                          │                            │
    │ model_name, cards        │                            │
    ├─────────────────────────▶│                            │
    │                          │ match model                │
    │                          ├───────────────────────────▶│
    │                          │                            │ deploy_helper.py
    │                          │◀───────────────────────────┤
    │                          │                            │
    │                          │ execute deploy             │
    │                          ├───────────────────────────▶│
    │                          │                            │ (via SSH)
    │                          │◀───────────────────────────┤
    │                          │                            │
    │                          │ monitor NPU                │
    │                          ├───────────────────────────▶│
    │                          │                            │ (ascend-command)
    │                          │◀───────────────────────────┤
    │                          │                            │
    ▼                          ▼                            ▼

Note: No direct skill-to-skill calls. All orchestration by Agent (Rule 3).


References

DocumentDescription
task-deploy-model.mdDeployment task steps
task-test-model.mdTesting task steps
model-catalog.mdComplete model catalog
api-parameters.mdAPI parameter reference
prerequisites.mdPrerequisites checklist
verification-method.mdVerification steps
troubleshooting.mdTroubleshooting guide
scripts/deploy_helper.pyModel matching helper