Basics of running PyTorch on NAISS systems¶

The first sections will be about how to get started and run this notebook on the NAISS system of choice.

Getting the notebook¶

This notebook and the other demo material can be found through the Github repo for the course website at https://github.com/NAISS-Training/ai-intro/tree/main/docs/demos/.

You can download it with:

wget https://raw.githubusercontent.com/NAISS-Training/ai-intro/refs/heads/main/docs/demos/torch_basics.ipynb

Setting up the software environment¶

The software for this demo was set-up with using an Apptainer container. Apptainer only runs on Linux, but is available on most HPC clusters. The recipe for this containers can be found among the demo material. To fetch and build do:

wget https://raw.githubusercontent.com/NAISS-Training/ai-intro/refs/heads/main/docs/demos/pytorch-bundle.def
apptainer build pytorch-bundle.sif pytorch-bundle.def

You can then launch a jupyter lab instance with:

apptainer exec pytorch-bundle.sif jupyter lab

Alvis Open OnDemand Runtime¶

For use with the Alvis Open OnDemand you want to use a custom runtime:

wget https://raw.githubusercontent.com/NAISS-Training/ai-intro/refs/heads/main/docs/demos/pytorch-bundle-container.sh
mkdir -p ~/portal/jupyter/
mv pytorch-bundle-container.sh ~/portal/jupyter/

When you have done this, then this runtime should become an option in the runtime drop-down for the Jupyter interactive app.

Preparing the demo¶

For this demo we will use the CIFAR-10 dataset and a VGG-style deep convolutional neural network. The motivation is to have an very quick and conceptually simple demo which still puts some load on the GPU.

Dataset set-up¶

In [ ]:
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import CIFAR10

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(
        mean=(0.4914, 0.4822, 0.4465),
        std=(0.2470, 0.2435, 0.2616),
    ),
])
dataset = CIFAR10(
    root="/mimer/NOBACKUP/Datasets/CIFAR/",
    train=True,
    download=False,
    transform=transform,
)
dataloader = DataLoader(
    dataset,
    batch_size=512,
    shuffle=True,
    num_workers=3,
)

Model set-up¶

In [ ]:
from torch.nn import Module, Sequential, Conv2d, ReLU, MaxPool2d, AdaptiveMaxPool2d, Flatten, Linear


def get_model(num_classes: int) -> Module:
    conv_kws = {
        "kernel_size": 3,
        "stride": 1,
        "padding": "same",
    }
    return Sequential(
        Conv2d(3, 64, **conv_kws), ReLU(),
        Conv2d(64, 64, **conv_kws), ReLU(),
        MaxPool2d(2),
        Conv2d(64, 128, **conv_kws), ReLU(),
        Conv2d(128, 128, **conv_kws), ReLU(),
        MaxPool2d(2),
        Conv2d(128, 256, **conv_kws), ReLU(),
        Conv2d(256, 256, **conv_kws), ReLU(),
        Conv2d(256, 256, **conv_kws), ReLU(),
        AdaptiveMaxPool2d((1, 1)),
        Flatten(start_dim=-3),
        Linear(256, 1024), ReLU(),
        Linear(1024, 1024), ReLU(),
        Linear(1024, num_classes),
    )

Training set-up¶

Basic set-up with

  • Manual seed for reproducibility
  • Checkpointing
  • Logging to stdout
In [ ]:
import torch
from torch.nn.functional import cross_entropy
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingLR

torch.manual_seed(10037)

torch.set_float32_matmul_precision("high")
device = "cuda"
model = get_model(num_classes=10).to(device)

n_epochs = 5
optim = AdamW(model.parameters(), lr=1e-3)

model.train()
for epoch in range(n_epochs):
    train_loss = 0.0
    train_acc = 0
    for batch in dataloader:
        optim.zero_grad()
        
        x = batch[0].to(device)
        y = batch[1].to(device)

        y_pred = model(x)
    
        loss = cross_entropy(y_pred, y)
        with torch.no_grad():
            train_loss += loss.item() * x.size(0)
            train_acc += (y_pred.argmax(dim=1) == y).sum().item()
    
        loss.backward()
        optim.step()
    
    train_loss /= len(dataloader.dataset)
    train_acc /= len(dataloader.dataset)
    print(f"Epoch {epoch}, loss={train_loss}, acc={train_acc}")
    
    # Very basic checkpointing
    torch.save(
        {
            'epoch': epoch,
            'rng_state': torch.get_rng_state(),
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optim.state_dict(),
        },
        "cnn_cifar10_latest.pkl",
    )

Optional excercise: Resume training from a checkpoint https://docs.pytorch.org/tutorials/beginner/saving_loading_models.html.

Excercises¶

  1. Launch this notebook through the Jupyter Interactive App on Alvis OnDemand.
    1. A single GPU and 1 or 2 hours should be enough
    2. Use the runtime from when we set-up the software environment
  2. Step through the notebook and take special notice of usage of device and .to() to make it run on the GPU
  3. Monitor the GPU usage with nvtop while you rerun the notebook
    1. Open up a terminal in the Jupyter Lab instance
    2. Load the nvtop module ml nvtop
    3. Run nvtop
    4. Rerun the notebook while checking the nvtop output
  4. Check the job_stats page for this job (hint: use the button next to the launch button where you launched this notebook from Alvis OnDemand)
  5. Optional: Restart the training from a saved checkpoint

Training set-up with PyTorch Lightning¶

PyTorch Lightning is a wrapper used to remove a lot of the boiler plate surrounding the training set-up.

For a quick intro see: Lightning in 15 minutes

In [ ]:
import lightning as L
import torch
from lightning.utilities.seed import seed_everything
from torch.nn.functional import cross_entropy
from torch.optim import AdamW, Optimizer

seed_everything(10037)

class VisionClassifier(L.LightningModule):
    def __init__(self, model: torch.nn.Module):
        super().__init__()
        self.model = model

    def training_step(self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int) -> torch.Tensor:
        # N.B. we don't use .to(device) here, that is handled by Lightning
        x, y = batch
        y_pred = self.model(x)
        
        loss = cross_entropy(y_pred, y)
        with torch.no_grad():
            acc = (y_pred.argmax(dim=1) == y).float().mean().item()
        
        self.log("loss", loss, prog_bar=True)
        self.log("acc", acc, prog_bar=True)

        return loss

    def configure_optimizers(self) -> Optimizer:
        optimizer = AdamW(self.model.parameters(), lr=1e-3)
        return optimizer


trainer = L.Trainer(max_epochs=5, enable_progress_bar=True)
trainer.fit(
    model=VisionClassifier(get_model(num_classes=10)),
    train_dataloaders=dataloader,
)

Excercises¶

  1. Compare the Lightning set-up with bare PyTorch set-up and check if both make use of the GPU as expected.
  2. Optional: Read the Lightning documentation on Checkpointing and
    1. Figure out where checkpoints are stored
    2. Continue training from a checkpoint
In [ ]: