PfHP What you will learn

Skip to main content

\( \newcommand{\R}{\mathbb R} \newcommand{\Rm}{\mathbb R^m} \newcommand{\Rn}{\mathbb R^n} \newcommand{\Rnxn}{\mathbb R^{n \times n}} \newcommand{\Rmxn}{\mathbb R^{m \times n}} \newcommand{\C}{\mathbb C} \newcommand{\Cm}{\mathbb C^m} \newcommand{\Cmxm}{\mathbb C^{m \times m}} \newcommand{\Cnxn}{\mathbb C^{n \times n}} \newcommand{\Cmxn}{\mathbb C^{m \times n}} \newcommand{\Cn}{\mathbb C^n} \newcommand{\Null}{{\cal N}} \newcommand{\Col}{{\cal C}} \newcommand{\Rowspace}{{\cal R}} \newcommand{\Span}{{\cal Span}} \newcommand{\rank}{{\rm rank}} \newcommand{\FlaTwoByTwo}[4]{ \left( \begin{array}{c | c} #1 \amp #2 \\ \hline #3 \amp #4 \end{array} \right) } \newcommand{\FlaTwoByTwoSingleLine}[4]{ \left( \begin{array}{c c} #1 \amp #2 \\ #3 \amp #4 \end{array} \right) } \newcommand{\FlaTwoByTwoSingleLineNoPar}[4]{ \begin{array}{c c} #1 \amp #2 \\ #3 \amp #4 \end{array} } \newcommand{\FlaOneByTwo}[2]{ \left( \begin{array}{c | c} #1 \amp #2 \end{array} \right) } \newcommand{\FlaOneByTwoSingleLine}[2]{ \left( \begin{array}{c c} #1 \amp #2 \end{array} \right) } \newcommand{\FlaTwoByOne}[2]{ \left( \begin{array}{c} #1 \\ \hline #2 \end{array} \right) } \newcommand{\FlaTwoByOneSingleLine}[2]{ \left( \begin{array}{c} #1 \\ #2 \end{array} \right) } \newcommand{\FlaThreeByOneB}[3]{ \left( \begin{array}{c} #1 \\ \hline #2 \\ #3 \end{array} \right) } \newcommand{\FlaThreeByOneT}[3]{ \left( \begin{array}{c} #1 \\ #2 \\ \hline #3 \end{array} \right) } \newcommand{\FlaOneByThreeR}[3]{ \left( \begin{array}{c | c c} #1 \amp #2 \amp #3 \end{array} \right) } \newcommand{\FlaOneByThreeL}[3]{ \left( \begin{array}{c c | c} #1 \amp #2 \amp #3 \end{array} \right) } \newcommand{\FlaThreeByThreeBR}[9]{ \left( \begin{array}{c | c c} #1 \amp #2 \amp #3 \\ \hline #4 \amp #5 \amp #6 \\ #7 \amp #8 \amp #9 \end{array} \right) } \newcommand{\FlaThreeByThreeTL}[9]{ \left( \begin{array}{c c | c} #1 \amp #2 \amp #3 \\ #4 \amp #5 \amp #6 \\ \hline #7 \amp #8 \amp #9 \end{array} \right) } \newcommand{\diag}[1]{{\rm diag}( #1 )} \newcommand{\URt}{{\sc HQR}} \newcommand{\FlaAlgorithm}{ \begin{array}{|l|} \hline \routinename \\ \hline \partitionings \\ ~~~ \begin{array}{l} \partitionsizes \end{array} \\ {\bf \color{blue} {while}~} \guard \\ ~~~ \begin{array}{l} \repartitionings \end{array} \\ ~~~ \color{red} { \begin{array}{l} \hline \color{black} {\update} \\ \hline \end{array}} \\ ~~~ \begin{array}{l} \moveboundaries \end{array} \\ {\bf \color{blue} {endwhile}} \\ \hline \end{array} } \newcommand{\FlaAlgorithmWithInit}{ \begin{array}{|l|} \hline \routinename \\ \hline \initialize \\ \partitionings \\ ~~~ \begin{array}{l} \partitionsizes \end{array} \\ {\bf \color{blue} {while}~} \guard \\ ~~~ \begin{array}{l} \repartitionings \end{array} \\ ~~~ \color{red} { \begin{array}{l} \hline \color{black} {\update} \\ \hline \end{array}} \\ ~~~ \begin{array}{l} \moveboundaries \end{array} \\ {\bf \color{blue} {endwhile}} \\ \hline \end{array} } \newcommand{\FlaBlkAlgorithm}{ \begin{array}{|l|} \hline \routinename \\ \hline \partitionings \\ ~~~ \begin{array}{l} \partitionsizes \end{array} \\ {\bf \color{blue} {while}~} \guard \\ ~~~ {\bf choose~block~size~} \blocksize \\ ~~~ \begin{array}{l} \repartitionings \end{array} \\ ~~~ ~~~ \repartitionsizes \\ ~~~ \color{red} { \begin{array}{l} \hline \color{black} {\update} \\ \hline \end{array}} \\ ~~~ \begin{array}{l} \moveboundaries \end{array} \\ {\bf \color{blue} {endwhile}} \\ \hline \end{array} } \newcommand{\complexone}{ \begin{array}{|c|}\hline \!\pm\! \\ \hline \end{array}~ } \newcommand{\HQR}{{\rm HQR}} \newcommand{\QR}{{\rm QR}} \newcommand{\st}{{\rm \ s.t. }} \newcommand{\QRQ}{{\rm {\normalsize \bf Q}{\rm \tiny R}}} \newcommand{\QRR}{{\rm {\rm \tiny Q}{\bf \normalsize R}}} \newcommand{\deltaalpha}{\delta\!\alpha} \newcommand{\deltax}{\delta\!x} \newcommand{\deltay}{\delta\!y} \newcommand{\deltaz}{\delta\!z} \newcommand{\deltaw}{\delta\!w} \newcommand{\DeltaA}{\delta\!\!A} \newcommand{\meps}{\epsilon_{\rm mach}} \newcommand{\fl}[1]{{\rm fl( #1 )}} \newcommand{\becomes}{:=} \newcommand{\defrowvector}[2]{ \left(#1_0, #1_1, \ldots, #1_{#2-1}\right) } \newcommand{\tr}[1]{{#1}^T} \newcommand{\LUpiv}[1]{{\rm LU}(#1)} \newcommand{\maxi}{{\rm maxi}} \newcommand{\Chol}[1]{{\rm Chol}( #1 )} \newcommand{\lt}{<} \newcommand{\gt}{>} \newcommand{\amp}{&} \)

Unit 4.1.3 What you will learn

In this week, we discover how to parallelize matrix-matrix multiplication among multiple cores of a processor.

Upon completion of this week, we will be able to

Exploit multiple cores by multithreading your implementation.
Direct the compiler to parallelize code sections with OpenMP.
Parallelize the different loops and interpret the resulting performance.
Experience when loops can be more easily parallelized and when more care must be taken.
Apply the concepts of speedup and efficiency to implementations of matrix-matrix multiplication.
Analyze limitations on parallel efficiency due to Ahmdahl's law.

The enrichments introduce us to

The casting of other linear algebra operations in terms of matrix-matrix multiplication.
The benefits of having a family of algorithms for a specific linear algebra operation and where to learn how to systematically derive such a family.
Operations that resemble matrix-matrix multiplication that are encountered in Machine Learning, allowing the techniques to be extended.
Parallelizing matrix-matrix multiplication for distributed memory architectures.
Applying the learned techniques to the implementation of matrix-matrix multiplication on GPUs.