In the implementations in Section , an algorithmic blocking size is
passed as a parameter to the parallel matrix-matrix multiplication
routines. Thus, a natural question is what the value of this
parameter should be. Notice that the matrix-matrix multiplication
examples all used one of the following basic operations:
panel-panel update (rank-k update),
matrix-panel multiply, or panel-matrix multiply. Thus,
whatever blocking size makes these operations optimal can be
expected to yield fast implementations of matrix-matrix multiply.
In general, all level-3 BLAS can be implemented using these
basic operation, and the equivalents that only operation with
the upper or lower portion of the matrix:
symmetric panel-panel update (symmetric rank-k),
triangular matrix-panel multiply, and panel-triangular matrix multiply.
Thus, we provide an environment inquiry routine that,
given which of these operations underlies the algorithm
being implemented,
returns a suggested
algorithmic blocking size.
place HR here
place HR here Currently, the input parameter operation can take on the values