PyTorch

Avoid common PyTorch mistakes — train/eval mode, gradient leaks, device mismatches, and checkpoint gotchas.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
3 · 815 · 3 current installs · 4 all-time installs
byIván@ivangdavila
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (PyTorch best-practices) align with the content of SKILL.md. The only declared binary dependency is python3, which is reasonable for a PyTorch-focused skill. The metadata and runtime instructions all focus on model training/inference pitfalls and do not request unrelated capabilities.
Instruction Scope
SKILL.md contains static guidance (train/eval, gradient control, device management, saving/loading, etc.). It does not instruct the agent to run system commands, read files, access environment variables, or transmit data to external endpoints. No scope creep detected.
Install Mechanism
No install spec and no code files are present (instruction-only). That is the lowest-risk model — nothing is downloaded or written to disk by the skill itself.
Credentials
The skill declares no environment variables, no credentials, and no config paths. This is proportionate to an advisory/reference skill that only provides textual recommendations.
Persistence & Privilege
always is false and the skill is user-invocable; it does not request permanent presence, elevated privileges, or modify other skills. Normal autonomous invocation is allowed by platform defaults but there is no indication this skill requires special persistence.
Assessment
This skill is a read-only cheat sheet for common PyTorch mistakes and appears safe: it doesn't ask for credentials, install code, or read files. Because it's instruction-only, it will only return text guidance — it will not execute code by itself. Note the skill source/homepage is unknown; if you need higher assurance, prefer skills from a known publisher or ones that link to an official homepage. Also remember this is guidance only — the agent still needs PyTorch/environment set up to run real code, and you should avoid pasting secrets or private data into prompts when asking for debugging help.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk972ewtvbmbf38w0rzmtwrc4as80wpvk

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🔥 Clawdis
OSLinux · macOS · Windows
Binspython3

SKILL.md

Train vs Eval Mode

  • model.train() enables dropout, BatchNorm updates — default after init
  • model.eval() disables dropout, uses running stats — MUST call for inference
  • Mode is sticky — train/eval persists until explicitly changed
  • model.eval() doesn't disable gradients — still need torch.no_grad()

Gradient Control

  • torch.no_grad() for inference — reduces memory, speeds up computation
  • loss.backward() accumulates gradients — call optimizer.zero_grad() before backward
  • zero_grad() placement matters — before forward pass, not after backward
  • .detach() to stop gradient flow — prevents memory leak in logging

Device Management

  • Model AND data must be on same device — model.to(device) and tensor.to(device)
  • .cuda() vs .to('cuda') — both work, .to(device) more flexible
  • CUDA tensors can't convert to numpy directly — .cpu().numpy() required
  • torch.device('cuda' if torch.cuda.is_available() else 'cpu') — portable code

DataLoader

  • num_workers > 0 uses multiprocessing — Windows needs if __name__ == '__main__':
  • pin_memory=True with CUDA — faster transfer to GPU
  • Workers don't share state — random seeds differ per worker, set in worker_init_fn
  • Large num_workers can cause memory issues — start with 2-4, increase if CPU-bound

Saving and Loading

  • torch.save(model.state_dict(), path) — recommended, saves only weights
  • Loading: create model first, then model.load_state_dict(torch.load(path))
  • map_location for cross-device — torch.load(path, map_location='cpu') if saved on GPU
  • Saving whole model pickles code path — breaks if code changes

In-place Operations

  • In-place ops end with _tensor.add_(1) vs tensor.add(1)
  • In-place on leaf variable breaks autograd — error about modified leaf
  • In-place on intermediate can corrupt gradient — avoid in computation graph
  • tensor.data bypasses autograd — legacy, prefer .detach() for safety

Memory Management

  • Accumulated tensors leak memory — .detach() logged metrics
  • torch.cuda.empty_cache() releases cached memory — but doesn't fix leaks
  • Delete references and call gc.collect() — before empty_cache if needed
  • with torch.no_grad(): prevents graph storage — crucial for validation loop

Common Mistakes

  • BatchNorm with batch_size=1 fails in train mode — use eval mode or track_running_stats=False
  • Loss function reduction default is 'mean' — may want 'sum' for gradient accumulation
  • cross_entropy expects logits — not softmax output
  • .item() to get Python scalar — .numpy() or [0] deprecated/error

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…