CS 395T: Performance Analysis of Networked Systems (Spring 2024)


Instructor: Venkat Arun

Course topic: Normally, we are constantly tweaking computer systems in a never ending attempt to get them to perform well for the latest and greatest applications. On the one hand, it guarantees employment for systems people; something always needs fixing and innovation. On the other hand, it is frustrating to revisit the same questions repeatedly: Should we implement this in software or hardware? Should resource allocation be centralized or decentralized? What is the long list of "if conditions" that we need to make the OS CPU scheduler work? What network topology should we use? And so on...

Occasionally, we discover a unifying principle that allows us to escape this cycle for a part of the system. When we discover an algorithm or design that can be proven to consistently work well, we can move past endless tweaking of that component and direct our efforts elsewhere. As systems grow more complex and applications demand more reliable performance, it becomes crucial to stop revisiting issues we faced decades ago. This course will survey some success stories and try to extract unifying principlles to accelerate future progress.

Another goal of this course is to explore how to effectively bridge theory and practice. Theory provides deep insights that are otherwise difficult to obtain, but its value depends on understanding the practical constraints---making realistic assumptions and producing actionable insights.

Course structure: In the second half of each lecture, the instructor will provide the background for material that the students will discuss in the first half of the next lecture. In addition, students will conduct a research project combining theory and systems individually or in a pair. Our class sizes are usually small, which makes it interactive and (hopefully) fun.

Syllabus: Applications of queuing and control theory, randomized load balancing and routing algorithms, notions of fairness, beyond worst-case analysis for online scheduling and caching algorithms, performance verification, use of constraint solvers, and practical implications of complexity theory.

Canvas

Where? GDC 6.816

When? Mondays and Wednesdays at 2-2:30PM

Schedule:

Note: through a failure of version management on my part, I have lost parts of the schedule we followed in 2023. The topics were the same as what is listed below, but the specific details and the links have disappeard.

  1. Introduction + Switch load balancing I
    Types of theory

The switching/routing problem, PIM algorithm

    Optional: How to read a paper

  1. Switch load balancing II

iSLIP

    Optional: PLB, Conga

  1. Switch load balancing III

Valiant load balancing, birthday paradox

  1. Application load balancing II

Power of two choices vs random vs full visibility

Optional: Queuing theoretic analysis of power of two choices, Survey of the theory, Small amount of memory helps load balance

  1. Application load balancing III

Overload control (Breakwater)

  1. Process scheduling I

Work stealing theoretical analysis, Cilk programming model, Empirical analysis that concludes work stealing is best

  1. Process scheduling II

Decades of wasted cores in Linux
Optional: Scheduler in Linux v4.6.8.1, More bugs discovered through verification

  1. Notions of fairness I

Linux (contd), wireless scheduling and alpha fairness (throughput, proportional, max-min)

  1. Notions of fairness II

Multi resource fairness

  1. Notions of fairness III

FairCloud + RCS

  1. Caching replacement policies I
    LRU analysis
  2. Cache replacement policies II

Other practical concerns

  1. Incorporating ML in systems I

Competitive caching

  1. Incorporating ML in systems II

Statistical power of data

  1. Incorporating ML in systems III

Safe ML

  1. Computational complexity in practice I

P: traffic engineering

  1. Computational complexity in practice II
    NP: optimization problems in computer systems, Gurobi, Z3
  2. Computational complexity in practice II

Beyond NP: Program synthesis, two-player games and beyond

  1. Congestion control I

Classical: AIMD, Vegas, RCP

  1. Congestion control II

Control theory: RCP, DCTCP

  1. Congestion control III

Modern congestion control + CCAC

  1. Performance verification I

fPerf, verilay

  1. Performance verification II

MetaOpt

  1. Bounding performance I

Performal, Hoffman’s work

  1. Abstractions that aid performance I

PIFO

  1. Abstractions that aid performance II

Network slicing

  1. Project presentations I
  2. Project presentations II