huawei-cloud-ascend-command

API key required
Other

Huawei Ascend NPU natural language management skill, supporting both local direct connection and SSH remote modes. Provides comprehensive npu-smi command capabilities including device queries, configuration management, firmware upgrade, vNPU virtualization, certificate management, and compute power testing (FLOPS). Enables remote management and monitoring of Ascend NPU devices with real-time metrics tracking. Use this skill when the user wants to: (1) query NPU device status and health, (2) monitor temperature, power, memory/HBM utilization, (3) configure ECC settings and fan modes, (4) perform firmware upgrades, (5) manage vNPU virtualization, (6) run compute power tests, (7) check certificate information. Trigger: user mentionsnpu', 'ascend', 'NPU', 'Ascend', 'temperature', 'power', 'HBM', 'firmware', 'upgrade', 'vNPU', 'virtualization', 'certificate', 'FLOPS', 'compute', 'health', 'memory', 'utilization', 'ECC', 'fan', '昇腾', '昇腾卡', '昇腾状态', '显存', '算力', '设备查询', 'NPU监控'"

Install

openclaw skills install huawei-cloud-ascend-command

Provides Huawei Ascend NPU natural language command translation and execution. Supports full npu-smi commands: device queries, configuration management, firmware upgrade, vNPU virtualization, certificate management, and compute power testing.

Overview

This skill provides natural language management for Huawei Ascend NPU devices via npu-smi commands. Supports both local direct execution and SSH remote modes.

Related Skills:

  • huawei-cloud-ascend-remote-connect - General SSH remote operations (disk, Docker, logs, processes)
  • npu-smi-skill - Alternative NPU management implementation
  • model-deploy-test-skill - Model deployment and testing on Ascend DevServer

Capabilities:

  • Device queries (health, temperature, power, HBM, ECC, topology)
  • Configuration management (ECC, fan mode, fan speed)
  • Firmware upgrade and status
  • vNPU virtualization
  • Certificate management
  • Compute power testing (FLOPS)

Architecture

System Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                         Agent Orchestration                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  1. Parse user intent → Match Trigger keywords                        │    │
│  │  2. Prepare params → host, user, password explicit                 │    │
│  │  3. Invoke Skill → skill exec --name ascend-command            │    │
│  │  4. Return result → User                                          │    │
│  └────────────────────────────┬────────────────────────────────┘    │
│                               │ Explicit param passing (Rule 1)                │
│                               ▼                                     │
├─────────────────────────────────────────────────────────────────────┤
│                        User Interaction Layer                       │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  Natural Language Commands / CLI Arguments                  │    │
│  │  (NPU health check, temperature, firmware upgrade)          │    │
│  └────────────────────────────┬────────────────────────────────┘    │
│                               │                                     │
│                               ▼                                     │
├─────────────────────────────────────────────────────────────────────┤
│                      Skill Core Components (Stateless (Rule 2))           │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐      │
│  │ Command Executor│←→│   NPU Client    │←→│ Trend Recorder  │      │
│  │  - NL Parsing   │  │  - npu-smi cmd  │  │  - Metric Log   │      │
│  │  - Command Route│  │  - Local/SSH    │  │  - Trend Analysis│      │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘      │
│         │                      │                                     │
│         ▼                      ▼                                     │
├─────────────────────────────────────────────────────────────────────┤
│                    Huawei Cloud Ascend Infrastructure               │
│  Data Flow: Agent → Skill(host,user,pwd) → SSH → npu-smi → Response │
└─────────────────────────────────────────────────────────────────────┘

Agent Orchestration Flow

User request: "Query NPU status on 116.204.23.145"
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent parses intent                                               │
│   - Keyword "NPU" → Match ascend-command                       │
│   - IP address → SSH params needed                                  │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent prepares params (explicit, Rule 1)                              │
│   --host 116.204.23.145                                      │
│   --user root                                                │
│   --password ***                                             │
│   --command "NPU list"                                       │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Skill executes (Stateless, Rule 2)                                    │
│   - NpuClient(host, user, password) explicit receive                 │
│   - Execute npu-smi command                                        │
│   - Return result                                                 │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
      Return to user

Related Skills (Agent orchestrated, no direct call, Rule 3)

SkillPurposeOrchestration Scenario
huawei-cloud-ascend-remote-connectSSH remote opsAgent selects when user needs disk/Docker/logs
npu-smi-skillNPU managementAlternative impl, Agent selects by scenario
model-deploy-test-skillModel deployUse ascend-command to monitor NPU after deploy

Note: No direct calls between Skills, orchestrated by Agent based on user intent.

Prerequisites

System Requirements

  • Python 3.8+
  • Pure Python standard library implementation, no additional dependencies required

Environment Check

Prerequisite check: Python3 required

python3 --version  # Python3 >= 3.8

Authentication

Security rules (must be followed):

  • Prohibited from reading, echoing, or printing password values
  • Prohibited from asking the user to input passwords directly in the conversation
  • Only allowed to read credentials from command line arguments

Parameter Confirmation

Input Parameters

ParameterRequiredDescription
commandYes (one-shot)Natural language command to execute
hostNoSSH remote host IP address
userNoSSH username
passwordNoSSH password

Confirmation Requirements

The following operations require explicit user confirmation:

  • ECC enable/disable
  • Fan mode changes
  • Firmware upgrade
  • vNPU creation/deletion
  • Certificate management

Core Workflow

Task 1: One-Shot Command Execution

python3 scripts/main.py --command "NPU health check"

Task 2: SSH Remote Execution

python3 scripts/main.py --command "NPU list" --host 192.168.1.100 --user root --password xxx

Task 3: Interactive Mode

python3 scripts/main.py

Usage Instructions

Device Queries

List all NPU devices
NPU health check
Get NPU temperature
Get NPU power usage
Get NPU memory info

Configuration Management

Set ECC mode
Set fan mode

Firmware Upgrade

Query firmware version
Upgrade firmware

Virtualization

Query vNPU info
Create vNPU

FLOPS Test

Run FLOPS test
Run FP32 FLOPS test
Run multi-device FLOPS test

Output Format

Standard Response Format

All command outputs follow a structured format:

{
  "status": "success",
  "command": "npu-smi info -l",
  "description": "List all NPU devices"
}

Error Response Format

[ERROR] <error-code>
Message: <error-description>
Suggestion: <troubleshooting-tip>

Verification Method

Basic Verification Steps

  1. Environment Check

    python3 --version  # Verify Python 3.8+
    
  2. Local Command Test

    python3 scripts/main.py --command "NPU list"
    
  3. Remote Command Test

    python3 scripts/main.py --host <ascend-ip> --user root --password <pwd> --command "NPU health check"
    

Expected Results

Test CaseExpected Output
Environment checkPython version >= 3.8
Local commandNPU device information or "npu-smi not found"
Remote commandNPU status from remote server

Script Files

Entry File

  • main.py: Skill entry file (required)
    • Function: Provide interactive mode, one-shot mode, SSH remote mode
    • Supports: Command line arguments, interactive prompt

Core Scripts

  • executor.py: NPU command executor (main entry)

    • Function: Natural language parsing, command routing, sensitive operation confirmation
    • Supported commands: Device queries, configuration, firmware upgrade, virtualization, certificate management
    • Core methods: handle_command(text), confirm_sensitive()
  • npu_client.py: NPU client

    • Function: Wraps npu-smi command execution, supports local and SSH remote execution
    • Core methods: list_devices(), get_health(), get_temperature(), get_power(), get_memory(), set_ecc_enable(), upgrade_query(), create_vnpu(), get_tls_cert()
    • Optimization: SSH ControlMaster, batch execution, parallel FLOPS testing

Command Translation Mapping

Device Queries

Natural LanguageCommandDescription
List all NPU devicesnpu-smi info -lList all NPU devices
NPU card infonpu-smi info -mCard and chip mapping
NPU infonpu-smi infoFull NPU information

Health Check

Natural LanguageCommandDescription
NPU health checknpu-smi info -t health -i 0Check device health
Check NPU statusnpu-smi info -t health -i 0Check device health

Temperature/Power/Memory

Natural LanguageCommandDescription
NPU temperaturenpu-smi info -t temp -i 0 -c 0Get temperature
NPU powernpu-smi info -t power -i 0 -c 0Get power usage
NPU memorynpu-smi info -t memory -i 0 -c 0Get memory info
NPU utilizationnpu-smi info -t usages -i 0 -c 0Get utilization

Configuration

Natural LanguageCommandDescription
Set ECCnpu-smi set -t ecc-enable -i 0 -c 0 -d 1Enable ECC (requires confirmation)
Set fan modenpu-smi set -t pwm-mode -d 0Set fan mode (requires confirmation)

Firmware Upgrade

Natural LanguageCommandDescription
Firmware versionnpu-smi upgrade -b mcu -i 0Query firmware version
Upgrade firmwarenpu-smi upgrade -t mcu -i 0 -f file.hpmUpload firmware (requires confirmation)

Virtualization

Natural LanguageCommandDescription
vNPU infonpu-smi info -t info-vnpu -i 0 -c 0Query vNPU info
Template infonpu-smi info -t template-info -i 0Query template info
Create vNPUnpu-smi set -t create-vnpu -i 0 -c 0 -f vir04Create vNPU (requires confirmation)

Certificate Management

Natural LanguageCommandDescription
Certificate infonpu-smi info -t tls-cert -i 0 -c 0View TLS certificate
Certificate expirationnpu-smi info -t tls-cert-period -i 0 -c 0Query certificate expiration threshold

FLOPS Test (ascend-dmi)

Default behavior: single device FP16, fast return (~2 seconds)

Natural LanguageCommandDescription
FLOPS testascend-dmi -f -d 0 -t fp16 -qDefault: NPU 0, FP16
NPU FLOPSascend-dmi -f -d 0 -t fp16 -qSame as above
NPU 3 FLOPSascend-dmi -f -d 3 -t fp16 -qSpecify NPU 3
FP32 FLOPSascend-dmi -f -d 0 -t fp32 -qFP32 precision
All devices FLOPS8 devices async parallelAll NPUs tested simultaneously

Execution strategy:

  • Single device: Direct execution, ~2 seconds return
  • Multi-device: Async parallel execution, total ~2 seconds (not serial 16 seconds)
  • Precision: Default fp16, supports fp32/bf16/int8/hf32

Performance Optimization

OptimizationDescriptionEffect
SSH persistent connectionControlMaster reuseConnection time ~0.1s
Batch executionexecute_batch() for multiple commandsReduce SSH round trips
Parallel querytest_flops_parallel() 8-device async test16s -> 2s
Single parseparse_full_info() get all card info at onceO(N) -> O(1)

Troubleshooting

Command Not Recognized

  • Check if input contains supported keywords
  • Try using more explicit commands

SSH Connection Failed

  • Verify SSH parameters are correct
  • Ensure SSH service is running on target machine

npu-smi Not Available

  • Ensure CANN and npu-smi are installed on target machine
  • Check LD_LIBRARY_PATH environment variable

Notes

Security Warnings

  • Sensitive operations require user confirmation before execution
  • Passwords are only stored in memory during session
  • Blocked commands cannot be bypassed

Limitations

  • Requires npu-smi or ascend-dmi installed on target machine
  • SSH password authentication only (no interactive password prompt)

Best Practices

Best Practices

  1. Connection Mode Selection

    • Local direct: For NPU installed on local machine
    • SSH remote: For managing remote Ascend servers
  2. Command Execution Tips

    • Sensitive ops (firmware upgrade, config change) require user confirmation
    • Test on single card before batch operations
  3. Performance Monitoring

    • Regularly check NPU temperature and power
    • Monitor HBM usage to avoid OOM
  4. Troubleshooting

    • First check NPU health: npu-smi info
    • View detailed error log: npu-smi info -t board -i <npu_id>
    • See troubleshooting.md
  5. Trigger Keywords

    • Chinese: NPU, 昇腾, 显存, 算力, 温度, 功耗, 固件, 虚拟化
    • English: npu, ascend, temperature, power, HBM, firmware, vNPU, FLOPS

Directory Structure

huawei-cloud-ascend-command/
├── SKILL.md                    # Skill description (required)
├── scripts/                    # Scripts directory (required)
│   ├── __init__.py             # Module export
│   ├── main.py                 # Skill entry file (required)
│   ├── executor.py             # NPU command executor
│   └── npu_client.py           # NPU client
└── references/                 # Reference documentation
    ├── acceptance-criteria.md  # Acceptance criteria
    ├── verification-method.md  # Verification steps
    ├── troubleshooting.md      # Troubleshooting guide
    ├── device-queries.md       # Device queries reference
    ├── configuration.md        # Configuration reference
    ├── firmware-upgrade.md     # Firmware upgrade reference
    ├── virtualization.md       # Virtualization reference
    └── certificate-management.md # Certificate management

References

DocumentDescription
acceptance-criteria.mdAcceptance criteria for skill validation
verification-method.mdVerification steps and test methods
troubleshooting.mdTroubleshooting guide and common issues
device-queries.mdDevice queries command reference
configuration.mdConfiguration management reference
firmware-upgrade.mdFirmware upgrade reference
virtualization.mdVirtualization management reference
certificate-management.mdCertificate management reference

Technical Support