Unit 4.1.3 What you will learn
ΒΆIn this week, we discover how to parallelize matrix-matrix multiplication among multiple cores of a processor.
Upon completion of this week, we will be able to
Exploit multiple cores by multithreading your implementation.
Direct the compiler to parallelize code sections with OpenMP.
Parallelize the different loops and interpret the resulting performance.
Experience when loops can be more easily parallelized and when more care must be taken.
Apply the concepts of speedup and efficiency to implementations of matrix-matrix multiplication.
Analyze limitations on parallel efficiency due to Ahmdahl's law.
The enrichments introduce us to
The casting of other linear algebra operations in terms of matrix-matrix multiplication.
The benefits of having a family of algorithms for a specific linear algebra operation and where to learn how to systematically derive such a family.
Operations that resemble matrix-matrix multiplication that are encountered in Machine Learning, allowing the techniques to be extended.
Parallelizing matrix-matrix multiplication for distributed memory architectures.
Applying the learned techniques to the implementation of matrix-matrix multiplication on GPUs.