Instructor: Venkat Arun
Course topic: Normally, we are constantly tweaking computer systems in a never ending attempt to get them to perform well for the latest and greatest applications. On the one hand, it guarantees employment for systems people; something always needs fixing and innovation. On the other hand, it is frustrating to revisit the same questions repeatedly: Should we implement this in software or hardware? Should resource allocation be centralized or decentralized? What is the long list of "if conditions" that we need to make the OS CPU scheduler work? What network topology should we use? And so on...
Occasionally, we discover a unifying principle that allows us to escape this cycle for a part of the system. When we discover an algorithm or design that can be proven to consistently work well, we can move past endless tweaking of that component and direct our efforts elsewhere. As systems grow more complex and applications demand more reliable performance, it becomes crucial to stop revisiting issues we faced decades ago. This course will survey some success stories and try to extract unifying principlles to accelerate future progress.
Another goal of this course is to explore how to effectively bridge theory and practice. Theory provides deep insights that are otherwise difficult to obtain, but its value depends on understanding the practical constraints---making realistic assumptions and producing actionable insights.
Course structure: In the second half of each lecture, the instructor will provide the background for material that the students will discuss in the first half of the next lecture. In addition, students will conduct a research project combining theory and systems individually or in a pair. Our class sizes are usually small, which makes it interactive and (hopefully) fun.
Syllabus: Applications of queuing and control theory, randomized load balancing and routing algorithms, notions of fairness, beyond worst-case analysis for online scheduling and caching algorithms, performance verification, use of constraint solvers, and practical implications of complexity theory.
Where? GDC 6.816
When? Mondays and Wednesdays at 2-2:30PM
Schedule:
Note: through a failure of version management on my part, I have lost parts of the schedule we followed in 2023. The topics were the same as what is listed below, but the specific details and the links have disappeard.
The switching/routing problem, PIM algorithm
Optional: How to read a paper
Valiant load balancing, birthday paradox
Power of two choices vs random vs full visibility
Optional: Queuing theoretic analysis of power of two choices, Survey of the theory, Small amount of memory helps load balance
Work stealing theoretical analysis, Cilk programming model, Empirical analysis that concludes work stealing is best
Decades of wasted cores in Linux
Optional: Scheduler in Linux v4.6.8.1, More bugs discovered through verification
Linux (contd), wireless scheduling and alpha fairness (throughput, proportional, max-min)
Multi resource fairness
FairCloud + RCS
Other practical concerns
Competitive caching
Statistical power of data
Safe ML
P: traffic engineering
Beyond NP: Program synthesis, two-player games and beyond
Classical: AIMD, Vegas, RCP
Control theory: RCP, DCTCP
Modern congestion control + CCAC
fPerf, verilay
MetaOpt
Performal, Hoffman’s work
PIFO
Network slicing