Install
openclaw skills install pytorchAvoid common PyTorch mistakes — train/eval mode, gradient leaks, device mismatches, and checkpoint gotchas.
openclaw skills install pytorchmodel.train() enables dropout, BatchNorm updates — default after initmodel.eval() disables dropout, uses running stats — MUST call for inferencemodel.eval() doesn't disable gradients — still need torch.no_grad()torch.no_grad() for inference — reduces memory, speeds up computationloss.backward() accumulates gradients — call optimizer.zero_grad() before backwardzero_grad() placement matters — before forward pass, not after backward.detach() to stop gradient flow — prevents memory leak in loggingmodel.to(device) and tensor.to(device).cuda() vs .to('cuda') — both work, .to(device) more flexible.cpu().numpy() requiredtorch.device('cuda' if torch.cuda.is_available() else 'cpu') — portable codenum_workers > 0 uses multiprocessing — Windows needs if __name__ == '__main__':pin_memory=True with CUDA — faster transfer to GPUworker_init_fnnum_workers can cause memory issues — start with 2-4, increase if CPU-boundtorch.save(model.state_dict(), path) — recommended, saves only weightsmodel.load_state_dict(torch.load(path))map_location for cross-device — torch.load(path, map_location='cpu') if saved on GPU_ — tensor.add_(1) vs tensor.add(1)tensor.data bypasses autograd — legacy, prefer .detach() for safety.detach() logged metricstorch.cuda.empty_cache() releases cached memory — but doesn't fix leaksgc.collect() — before empty_cache if neededwith torch.no_grad(): prevents graph storage — crucial for validation loopbatch_size=1 fails in train mode — use eval mode or track_running_stats=Falsecross_entropy expects logits — not softmax output.item() to get Python scalar — .numpy() or [0] deprecated/error