Use when work targets NVIDIA GPUs for deep learning training, inference, distributed execution, CUDA/Triton kernels, or AI infra tuning. Enforces GPU-aware code conventions for PyTorch on CUDA, including dtype policy, memory movement, NCCL/DDP/FSDP choices, profiling, benchmarking, and H100/H200/B200 optimization habits.

Install

openclaw skills install @kkellyoffical/nvidia-cuda