Lecture Schedule

Background
 
                Course overview

                  
Basics of computer architecture: pipelined and OOO execution processors
                   Another useful set of slides on OOO processors
                   Lectures from the ECE architecture course
 
Measurement

                  
Measurements: timing and PAPI counters      

Compilers
 
               
x86 ISA and compilers

Sources of parallelism and locality in algorithms

                  Graph algorithms
                     Additional reading: The TAO of Parallelism in Programs, Pingali et al, PLDI 2011.
                    
                   Computational science algorithms
                  Video 

Caches                  
                  Cache architecture and memory hierarchy
                  Video

                  Locality, loop and data transformations
                  Video

                 Case study of locality enhancement: GEMM and ATLAS

  
                Intel VTune (I) profiler for performance analysis 

 
Shared-memory programming
               

                Work and span              

                
Shared-memory architectures: cache-coherence     

               
pThreads programs (3 lectures) 
                Video

               
Memory consistency                       
        
        
       
OpenMP 
      
                Parallel-prefix                                                                            
                                              
 
Vectorization
            
Vectorization (2 lectures) 

             Dependence analysis                                    
 

            
MPI (3 lectures)   

GPUs