# GPU Selection

## Available GPUs

```python
# T4 (16GB, good for inference and small training)
@app.function(gpu="T4", ...)

# A10G (24GB, good for medium training)
@app.function(gpu="A10G", ...)

# A100 (40GB/80GB, fastest for large-scale training)
@app.function(gpu="A100", ...)
```

## GPU Comparison

| GPU | VRAM | Use Case | Cost |
|-----|------|----------|------|
| T4 | 16GB | Inference, small models | $ |
| A10G | 24GB | Medium training, fine-tuning | $$ |
| A100-40GB | 40GB | Large training, 100M+ models | $$$ |
| A100-80GB | 80GB | Very large models | $$$$ |

## Recommendations by Task

| Task | Recommended GPU |
|------|-----------------|
| Large-model training (100M+ params) | A100 |
| Fine-tuning small models | A10G or T4 |
| Inference | T4 |
| Memory-heavy variants | A100 (extra VRAM headroom) |

## GPU Features

### A100 Specific Features

```python
@app.function(gpu="A100", image=image, timeout=3600)
def train():
    import torch

    # bfloat16 support (2x speedup)
    with torch.autocast(device_type="cuda", dtype=torch.bfloat16):
        output = model(input)

    # FlashAttention is enabled by default
    # Large VRAM allows bigger batch sizes
```

### Batch Size Guidelines

| GPU | Suggested Batch Size |
|-----|----------------------|
| T4 | 8 |
| A10G | 16 |
| A100-40GB | 32 |
| A100-80GB | 64 |

Reduce batch size for larger context windows or memory-heavy variants.

## Multi-GPU (Advanced)

```python
# Request multiple GPUs
@app.function(gpu="A100:2", ...)  # 2x A100

# Or specific count
@app.function(gpu="A100:4", ...)  # 4x A100
```
