Unit 4.5.5 Matrix-matrix multiplication on distributed memory architectures
ΒΆWe had intended to also cover the implementation of matrix-matrix multiplication on distributed memory architectures. We struggled with how to give learners access to a distributed memory computer and for this reason decided to not yet tackle this topic. Here we point you to some papers that will fill the void for now. These papers were written to target a broad audience, much like the materials in this course. Thus, you should already be well-equipped to study the distributed memory implementation of matrix-matrix multiplication on your own.
Think of distributed memory architectures as a collection of processors, each with their own cores, caches, and main memory, that can collaboratively solve problems by sharing data via the explicit sending and receiving of messages via a communication network. Such computers used to be called "multi-computers" to capture that each "node" in the system has its own processor and memory hierarchy.
Practical implementations of matrix-matrix multiplication on multi-computers are variations on the Scalable Universal Matrix-Multiplication Algorithm (SUMMA) [29]. While that paper should be easily understood upon completing this course, a systematic treatment of the subject that yields a large family of algorithms is given in [22]:
Martin D. Schatz, Robert A. van de Geijn, and Jack Poulson, Parallel Matrix Multiplication: A Systematic Journey, SIAM Journal on Scientific Computing, Volume 38, Issue 6, 2016.
which should also be easily understood by learners in this course.
Much like it is important to understand how data moves between the layers in the memory hierarchy of a single processor, it is important to understand how to share data between the memories of a multi-computer. Although the above mentioned papers give a brief overview of the data movements that are encounted when implementing matrix-matrix multiplication, orchestrated as "collective communication", it helps to look at this topic in-depth. For this we recommend the paper [4]
Ernie Chan, Marcel Heimlich, Avi Purkayastha, and Robert van de Geijn, Collective communication: theory, practice, and experience, Concurrency and Computation: Practice and Experience, Volume 19, Number 13, 2007.
In the future, we may turn these materials into yet another Massive Open Online Course. Until then, enjoy the reading.