Lecture
Schedule
Background
Course overview
Basics of computer architecture:
pipelined and OOO execution processors
Another
useful set of slides on OOO processors
Lectures
from the ECE architecture course
Measurement
Measurements: timing and PAPI counters
Compilers
x86 ISA
and compilers
Sources of parallelism and locality in algorithms
Graph algorithms
Additional reading: The TAO of
Parallelism in Programs, Pingali et al, PLDI 2011.
Computational science
algorithms
Video
Caches
Cache architecture
and memory hierarchy
Video
Locality, loop and data
transformations
Video
Case study of
locality enhancement: GEMM and ATLAS
Intel VTune (I) profiler for performance
analysis
Shared-memory programming
Work and span
Shared-memory architectures:
cache-coherence
pThreads programs (3 lectures)
Video
Memory
consistency
OpenMP
Parallel-prefix
Vectorization
Vectorization
(2 lectures)
Dependence analysis
MPI (3 lectures)
GPUs