Basics of running PyTorch on NAISS systems¶
The first sections will be about how to get started and run this notebook on the NAISS system of choice.
Getting the notebook¶
This notebook and the other demo material can be found through the Github repo for the course website at https://github.com/NAISS-Training/ai-intro/tree/main/docs/demos/.
You can download it with:
wget https://raw.githubusercontent.com/NAISS-Training/ai-intro/refs/heads/main/docs/demos/torch_basics.ipynb
Setting up the software environment¶
The software for this demo was set-up with using an Apptainer container. Apptainer only runs on Linux, but is available on most HPC clusters. The recipe for this containers can be found among the demo material. To fetch and build do:
wget https://raw.githubusercontent.com/NAISS-Training/ai-intro/refs/heads/main/docs/demos/pytorch-bundle.def
apptainer build pytorch-bundle.sif pytorch-bundle.def
You can then launch a jupyter lab instance with:
apptainer exec pytorch-bundle.sif jupyter lab
Alvis Open OnDemand Runtime¶
For use with the Alvis Open OnDemand you want to use a custom runtime:
wget https://raw.githubusercontent.com/NAISS-Training/ai-intro/refs/heads/main/docs/demos/pytorch-bundle-container.sh
mkdir -p ~/portal/jupyter/
mv pytorch-bundle-container.sh ~/portal/jupyter/
When you have done this, then this runtime should become an option in the runtime drop-down for the Jupyter interactive app.
Preparing the demo¶
For this demo we will use the CIFAR-10 dataset and a VGG-style deep convolutional neural network. The motivation is to have an very quick and conceptually simple demo which still puts some load on the GPU.
Dataset set-up¶
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import CIFAR10
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(
mean=(0.4914, 0.4822, 0.4465),
std=(0.2470, 0.2435, 0.2616),
),
])
dataset = CIFAR10(
root="/mimer/NOBACKUP/Datasets/CIFAR/",
train=True,
download=False,
transform=transform,
)
dataloader = DataLoader(
dataset,
batch_size=512,
shuffle=True,
num_workers=3,
)
Model set-up¶
from torch.nn import Module, Sequential, Conv2d, ReLU, MaxPool2d, AdaptiveMaxPool2d, Flatten, Linear
def get_model(num_classes: int) -> Module:
conv_kws = {
"kernel_size": 3,
"stride": 1,
"padding": "same",
}
return Sequential(
Conv2d(3, 64, **conv_kws), ReLU(),
Conv2d(64, 64, **conv_kws), ReLU(),
MaxPool2d(2),
Conv2d(64, 128, **conv_kws), ReLU(),
Conv2d(128, 128, **conv_kws), ReLU(),
MaxPool2d(2),
Conv2d(128, 256, **conv_kws), ReLU(),
Conv2d(256, 256, **conv_kws), ReLU(),
Conv2d(256, 256, **conv_kws), ReLU(),
AdaptiveMaxPool2d((1, 1)),
Flatten(start_dim=-3),
Linear(256, 1024), ReLU(),
Linear(1024, 1024), ReLU(),
Linear(1024, num_classes),
)
import torch
from torch.nn.functional import cross_entropy
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingLR
torch.manual_seed(10037)
torch.set_float32_matmul_precision("high")
device = "cuda"
model = get_model(num_classes=10).to(device)
n_epochs = 5
optim = AdamW(model.parameters(), lr=1e-3)
model.train()
for epoch in range(n_epochs):
train_loss = 0.0
train_acc = 0
for batch in dataloader:
optim.zero_grad()
x = batch[0].to(device)
y = batch[1].to(device)
y_pred = model(x)
loss = cross_entropy(y_pred, y)
with torch.no_grad():
train_loss += loss.item() * x.size(0)
train_acc += (y_pred.argmax(dim=1) == y).sum().item()
loss.backward()
optim.step()
train_loss /= len(dataloader.dataset)
train_acc /= len(dataloader.dataset)
print(f"Epoch {epoch}, loss={train_loss}, acc={train_acc}")
# Very basic checkpointing
torch.save(
{
'epoch': epoch,
'rng_state': torch.get_rng_state(),
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optim.state_dict(),
},
"cnn_cifar10_latest.pkl",
)
Optional excercise: Resume training from a checkpoint https://docs.pytorch.org/tutorials/beginner/saving_loading_models.html.
Excercises¶
- Launch this notebook through the Jupyter Interactive App on Alvis OnDemand.
- A single GPU and 1 or 2 hours should be enough
- Use the runtime from when we set-up the software environment
- Step through the notebook and take special notice of usage of
deviceand.to()to make it run on the GPU - Monitor the GPU usage with nvtop while you rerun the notebook
- Open up a terminal in the Jupyter Lab instance
- Load the nvtop module
ml nvtop - Run
nvtop - Rerun the notebook while checking the nvtop output
- Check the job_stats page for this job (hint: use the button next to the launch button where you launched this notebook from Alvis OnDemand)
- Optional: Restart the training from a saved checkpoint
Training set-up with PyTorch Lightning¶
PyTorch Lightning is a wrapper used to remove a lot of the boiler plate surrounding the training set-up.
For a quick intro see: Lightning in 15 minutes
import lightning as L
import torch
from lightning.utilities.seed import seed_everything
from torch.nn.functional import cross_entropy
from torch.optim import AdamW, Optimizer
seed_everything(10037)
class VisionClassifier(L.LightningModule):
def __init__(self, model: torch.nn.Module):
super().__init__()
self.model = model
def training_step(self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int) -> torch.Tensor:
# N.B. we don't use .to(device) here, that is handled by Lightning
x, y = batch
y_pred = self.model(x)
loss = cross_entropy(y_pred, y)
with torch.no_grad():
acc = (y_pred.argmax(dim=1) == y).float().mean().item()
self.log("loss", loss, prog_bar=True)
self.log("acc", acc, prog_bar=True)
return loss
def configure_optimizers(self) -> Optimizer:
optimizer = AdamW(self.model.parameters(), lr=1e-3)
return optimizer
trainer = L.Trainer(max_epochs=5, enable_progress_bar=True)
trainer.fit(
model=VisionClassifier(get_model(num_classes=10)),
train_dataloaders=dataloader,
)
Excercises¶
- Compare the Lightning set-up with bare PyTorch set-up and check if both make use of the GPU as expected.
- Optional: Read the Lightning documentation on Checkpointing and
- Figure out where checkpoints are stored
- Continue training from a checkpoint