Course objectives: To obtain the high level of end-to-end performance needed in problem domains like graphics, computer games, and machine learning, it is necessary for programs to exploit many of the features of modern computer architectures. In this course, we will study the performance-critical features of modern computer architectures, and discuss how applications can take advantage of them to obtain high performance. This is not a course on software tricks; rather, the emphasis is on abstractions of computer architecture, understanding performance, and obtaining performance when you need it. CS377P will be co-taught by a team from Intel who will present lectures on Intel's performance tools like VTune and Advisor, and teach students how these tools can be used to analyze and improve program performance. Course assignments will require the use of these tools.
Topics covered in lecture include the following:
- Analysis of applications that need high end-to-end
performance
- Understanding performance: performance models, Amdahl's law
- Measurement and design of computer experiments
- Micro-benchmarks for abstracting performance-critical aspects of computer systems
- Memory hierarchy: caches, virtual memory, exploiting spatial
and temporal locality
- Vectors and vectorization
- GPUs and GPU programming
- Multi-core processors and shared-memory programming, OpenMP
- Distributed-memory machines and message-passing programming, MPI
- Self-optimizing software
Prerequisites:
programming maturity, knowledge of C/C++, basic course on modern
computer architecture
Course work: There will 6 substantial programming
assignments (60% of grade), a mid-semester exam (15% of grade)
and a final exam (25% of grade).
Discussion and assignment: You need to use Canvas
and Piazza
for discussion and submitting assignments.