CS 395T: Performance Analysis of Networked Systems (Spring 2024)

Instructor: Venkat Arun

Course topic: Normally, we are constantly tweaking computer systems in a never ending attempt to get them to perform well for the latest and greatest applications. On the one hand, it guarantees employment for systems people; something always needs fixing and innovation. On the other hand, it is frustrating to revisit the same questions repeatedly: Should we implement this in software or hardware? Should resource allocation be centralized or decentralized? What is the long list of "if conditions" that we need to make the OS CPU scheduler work? What network topology should we use? And so on...

Occasionally, we discover a unifying principle that allows us to escape this cycle for a part of the system. When we discover an algorithm or design that can be proven to consistently work well, we can move past endless tweaking of that component and direct our efforts elsewhere. As systems grow more complex and applications demand more reliable performance, it becomes crucial to stop revisiting issues we faced decades ago. This course will survey some success stories and try to extract unifying principlles to accelerate future progress.

Another goal of this course is to explore how to effectively bridge theory and practice. Theory provides deep insights that are otherwise difficult to obtain, but its value depends on understanding the practical constraints---making realistic assumptions and producing actionable insights.

Course structure: In the second half of each lecture, the instructor will provide the background for material that the students will discuss in the first half of the next lecture. In addition, students will conduct a research project combining theory and systems individually or in a pair. Our class sizes are usually small, which makes it interactive and (hopefully) fun.

Syllabus: Applications of queuing and control theory, randomized load balancing and routing algorithms, notions of fairness, beyond worst-case analysis for online scheduling and caching algorithms, performance verification, use of constraint solvers, and practical implications of complexity theory.

Canvas

Where? GDC 6.816

When? Mondays and Wednesdays at 2-2:30PM

Schedule:

Note: through a failure of version management on my part, I have lost parts of the schedule we followed in 2023. The topics were the same as what is listed below, but the specific details and the links have disappeard.

Introduction + Switch load balancing I
Types of theory

The switching/routing problem, PIM algorithm

Optional: How to read a paper

Switch load balancing II

iSLIP

Optional: PLB, Conga

Switch load balancing III

Valiant load balancing, birthday paradox

Application load balancing II

Power of two choices vs random vs full visibility

Optional: Queuing theoretic analysis of power of two choices, Survey of the theory, Small amount of memory helps load balance

Application load balancing III

Overload control (Breakwater)

Process scheduling I

Work stealing theoretical analysis, Cilk programming model, Empirical analysis that concludes work stealing is best

Process scheduling II

Decades of wasted cores in Linux
Optional: Scheduler in Linux v4.6.8.1, More bugs discovered through verification

Notions of fairness I

Linux (contd), wireless scheduling and alpha fairness (throughput, proportional, max-min)

Notions of fairness II

Multi resource fairness

Notions of fairness III

FairCloud + RCS

Caching replacement policies I
LRU analysis
Cache replacement policies II

Other practical concerns

Incorporating ML in systems I

Competitive caching

Incorporating ML in systems II

Statistical power of data

Incorporating ML in systems III

Safe ML

Computational complexity in practice I

P: traffic engineering

Computational complexity in practice II
NP: optimization problems in computer systems, Gurobi, Z3
Computational complexity in practice II

Beyond NP: Program synthesis, two-player games and beyond

Congestion control I

Classical: AIMD, Vegas, RCP

Congestion control II

Control theory: RCP, DCTCP

Congestion control III

Modern congestion control + CCAC

Performance verification I

fPerf, verilay

Performance verification II

MetaOpt

Bounding performance I

Performal, Hoffman’s work

Abstractions that aid performance I

PIFO

Abstractions that aid performance II

Network slicing

Project presentations I
Project presentations II