Install
openclaw skills install musa-torch-codingTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
openclaw skills install musa-torch-codingGuide for generating PyTorch code that runs on Moore Threads (摩尔线程) MUSA GPUs using torch_musa.
MUSA (Metaverse Unified System Architecture) is Moore Threads' GPU computing platform. This skill helps generate code that:
torch_musa| CUDA | MUSA |
|---|---|
torch.cuda | torch.musa |
torch.device("cuda") | torch.device("musa") |
torch.cuda.is_available() | torch.musa.is_available() |
backend='nccl' | backend='mccl' |
torch.cuda.device_count() | torch.musa.device_count() |
torch.cuda.get_device_name() | torch.musa.get_device_name() |
DO NOT install PyTorch, vLLM, or related packages manually. MUSA environments are custom-built and include:
Installing standard packages from PyPI will break the environment.
MUSA provides pre-configured conda environments. Common environment names:
v1.2 - MUSA SDK v1.2 environmentv1.3 - MUSA SDK v1.3 environment (newer)# List available MUSA environments
conda env list | grep -E "(v1\.2|v1\.3|musa)"
# Activate the appropriate environment
conda activate v1.2 # or v1.3
# Verify MUSA availability
python -c "import torch_musa; import torch; print(torch.musa.is_available())"
If no MUSA conda environment is detected:
Check if MUSA is installed:
which musaInfo # Should show musaInfo path
ls /usr/local/musa/ # MUSA SDK location
If MUSA is not set up:
musa-env-setup skill for complete environment installationCommon conda environment locations:
/opt/conda/envs/~/conda/envs//usr/local/conda/envs/| Variable | Purpose |
|---|---|
MUSA_VISIBLE_DEVICES=0,1,2,3 | Control visible GPU IDs |
MUSA_LAUNCH_BLOCKING=1 | Synchronous kernel launch |
MUDNN_LOG_LEVEL=INFO | Enable MUDNN logging |
TORCH_SHOW_CPP_STACKTRACES=1 | Show C++ stack traces |
When generating PyTorch code for MUSA:
Always import torch_musa
import torch_musa # Must import before using torch.musa
Use torch.device("musa")
device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")
tensor = torch.tensor([1.0, 2.0], device=device)
Use 'mccl' for distributed training
dist.init_process_group(backend='mccl', ...)
Mixed precision (AMP) is supported
from torch.cuda.amp import autocast, GradScaler # Same API
TensorCore optimization available
torch.backends.musa.matmul.allow_tf32 = True for TensorFloat32For common model types, see templates in references/:
reference.md - Complete MUSA API referenceimport torch
import torch_musa
print(f"MUSA available: {torch.musa.is_available()}")
print(f"Device count: {torch.musa.device_count()}")
print(f"Device name: {torch.musa.get_device_name(0)}")
import torch_musa
# Device setup
device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")
# Model and data to device
model = model.to(device)
inputs = inputs.to(device)
# Training (same as CUDA)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
import torch.distributed as dist
import torch_musa
# Initialize with mccl backend
dist.init_process_group(backend='mccl', rank=rank, world_size=world_size)
# Create process group on MUSA
torch.cuda.set_device(local_rank) # torch_musa extends torch.cuda API
When converting existing CUDA code to MUSA:
import torch_musa at the topcuda with musa in device stringsnccl with mccl for distributed backendrender group: sudo usermod -aG render $(whoami)LD_LIBRARY_PATH includes /usr/local/musa/lib/python setup.py clean && bash build.sh--env MTHREADS_VISIBLE_DEVICES=allFor detailed API reference and examples, see references/reference.md.