This semester, the course will focus on (i) how to exploit parallelism for machine learning and big-data applications, and (ii) how to exploit approximation to reduce power and energy consumption. There is a lot of current research in both the systems and machine learning communities on these topics, and a variety of domain-specific languages (DSLs) and implementations for these domains have been proposed recently for both shared-memory and distributed-memory architectures.
Topics include the following:
- Structure of parallelism and locality in important
algorithms in computational science and machine learning
- Algorithm abstractions: operator formulation of algorithms,
dependence graphs
- Multicore architectures: interconnection networks, cache coherence, memory consistency models, synchronization
- Scheduling and load-balancing
- Parallel data structures: lock-free data structures, array/graph partitioning
- Memory hierarchies and locality, cache-oblivious algorithms
- Compiler analysis and transformations
- Performance models: PRAM, BPRAM, logP
- Self-optimizing software, auto-tuning
- GPUs and GPU programming
- Case studies: Cilk, MPI, OpenMP, Map-reduce, Galois,
GraphLab
- Approximate computing for power and energy optimization
Students will present papers, participate in discussions, and
do a substantial final project. The readings will include some
of the classic papers in the field of parallel programming. In
addition, there will be a small number of programming
assignments and homeworks at the beginning of the semester. Some
of the lectures in the course will be given by Inderjit Dhillon
and Pradeep Ravikumar, who are experts in machine learning.
Prerequisites:
programming maturity, knowledge of C/C++, basic courses on
modern computer architecture and compilers
by Hennessy & Patterson, Morgan Kaufmann Publishers. For basic material on compilers, read "Optimizing Compilers for Modern Architectures" by Allen and Kennedy.
Lecture schedule and notes