Forming #tex2html_wrap

Next: Forming Up: Example: Parallelizing Matrix-Matrix Multiplication Previous: Forming

Forming

For this operation to be well-defined, we require matrices, A , B , and C to have dimensions , , and , respectively.

If we partition

displaymath15226

then

for . Thus, a matrix-matrix multiply can be written as a sequence of matrix-vector multiplies

Basic Parallel Algorithm:

We can implement the above sequence of matrix-vector multiplies by using views, splitting off one column and row at a time: Let C and B be partitioned like

where and equal the first column of C and first row of B . Then

eqnarray9881

So the computation of C can be viewed matrix-vector multiply of A times the first row of B , after which the operation can be performed, using the same approach, recursively.

This leads to the following algorithm:

Scale
Let and
Do until done
If and are empty, the loop is done.
Partition

Compute
Let and

Again, the ``

'' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning.

Blocked Algorithm:

In order to get better performance, it is advantageous to set up the algorithm to perform a sequence of matrix-panel-of-vector operations: Let C and B be partitioned like

where and now equal a panel of s columns. Then

eqnarray9916

So the computation of C can be viewed as a matrix-times-panel-of-vectors multiply ( A times the first panel of rows of B ), followed by a recursive operation using the same approach.

This leads to the following algorithm:

Scale
Let and
Do until done
Determine width s of split. If and are empty, the loop is done.
Partition

and

Compute
Let and

Again, the ``

'' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning. The resulting code is given in Fig.

. We will often refer to this approach to parallel matrix-matrix multiplication as a matrix-panel variant, to indicate that the basic operation uses matrix-panel-of-vectors (rows or columns) multiplication.

PLACE BEGIN HR HERE

PLACE END HR HERE

Pipelined Algorithm:

For large n , the above algorithm can be improved by introducing pipelining. This can be accomplished by setting the natural flow of computation to be to the right or left, for communication within rows, or to the top or bottom, for communication within columns.

Next: Forming Up: Example: Parallelizing Matrix-Matrix Multiplication Previous: Forming

rvdg@cs.utexas.edu