Running AI/ML workloads on NAISS systems

Time: 9:00 - 12:00

This session will introduce you to what is different when doing machine learning on a compute cluster. Performance considerations and best practices.

Code examples will focus on PyTorch and TensorFlow as frameworks, but also other utilities.

The main relevant NAISS resources is Alvis this first time and later Arrhenius GPU partition when that is in place.

Note

Click the Home (at the top) to see the date.

Prerequisites

To be able to follow along you should have a basic understanding of machine learning and know how to:

Topics

  • Intro to NAISS AI/ML resources
  • Running PyTorch and TensorFlow on GPUs
  • Performance considerations
  • Mixed precision and GPUs
  • Filesystem performance
  • Profiling

Note

Link to the course material: Running AI/ML Workloads on NAISS Systems