CS 378: Systems for ML

Overview, Syllabus, Structure

In this course, we will learn about the architecture, design, and implementation of software and hardware systems that play a central role in modern large-scale machine learning (ML) training and inference. Lectures will discuss various state-of-the-art frameworks for programming for distributed training and inference, and techniques for hardware acceleration, compilation, and distributed execution that make large scale training possible, easy to use, and performant. We will also cover the basic systems challenges brought on by the advent of large-language models (LLMs) with respect to performance and resource efficiency and the new techniques and systems building blocks that address these challenges. Specifically, topics we cover will include:

Software frameworks for ML
Hardware architectures to support ML
Distributed ML training
Compilation of machine learningof ML programs
Model serving techniques for performance and efficiency
LLMs and systems challenges in their training and inference
At-scale platforms for ML workloads
Operating and managing AI workloads

On the practice front, we will do programming 5 assignments covering:

Software frameworks for machine learning and distributed training
Collective communication in distributed training
Compilation
Fine-tuning LLMs
Quantization and pruning

See a tentative outline of lectures here and planned assignments here.

Administrative Details

Class time

Monday, Wednesday 9:30 am - 11:00 am

Class location

UTC 4.110

Pre-requisites (not strict, but useful)

CS429 and CS439 are useful but not required. A working knowledge of machine learning and computer systems will suffice to take this course as background relevant to the lecture, programming assignments and quizzes will be provided in class. Programming proficiency in Python is strongly encouraged.

Grading

The course will have 3 in-class quizzes (no midterms or finals), and 5 programming assignments that dovetail with in-class discussions of the topics above. Grading split is as follows:

The quizzes will cover 45% of class grad (15% each)
Assignments will cover 50% of total grade (10% each)
The remaining 5% is for class and Ed participation.

People

Instructor: Aditya Akella
Email: akella@cs.utexas.edu
Office Hours: Monday 11am-1pm
Location: GDC 6.826

TA: Bodun Hu
Email: bodunhu@utexas.edu
Office Hours: Wednesday 11:00am-12:00pm
Location: GDC 1.302 Station Desk 3

TA: Brian Chang
Email: brianchang@utexas.edu
Office Hours: Tuesday 10:30am-11:30am
Location: GDC 1.302 Station Desk 2