For this operation to be well-defined, we require matrices,
A , B , and C to have dimensions
,
, and
,
respectively.
If we partition
then
for .
Thus, a matrix-matrix multiply can be written as
a sequence of matrix-vector multiplies
We can implement the above sequence of matrix-vector multiplies by using views, splitting off one column and row at a time: Let C and B be partitioned like
where and
equal the first column of C and
first row of B .
Then
So the computation of C can be viewed
matrix-vector multiply of A times the first row of B ,
after which the operation can
be performed, using the same approach, recursively.
This leads to the following algorithm:
Again, the ``
- Scale
![]()
- Let
and
![]()
- Do until done
- If
and
are empty, the loop is done.
- Partition
![]()
- Compute
![]()
- Let
and
![]()
In order to get better performance, it is advantageous to set up the algorithm to perform a sequence of matrix-panel-of-vector operations: Let C and B be partitioned like
where and
now equal a panel of s columns.
Then
So the computation of C can be viewed
as a matrix-times-panel-of-vectors multiply
( A times the first panel of rows of B ),
followed by a recursive operation
using the same approach.
This leads to the following algorithm:
Again, the ``
- Scale
![]()
- Let
and
![]()
- Do until done
- Determine width s of split. If
and
are empty, the loop is done.
- Partition
![]()
and
![]()
- Compute
![]()
- Let
and
![]()
PLACE BEGIN HR HERE
PLACE END HR HERE
For large n , the above algorithm can be improved by introducing pipelining. This can be accomplished by setting the natural flow of computation to be to the right or left, for communication within rows, or to the top or bottom, for communication within columns.