General implementation

Next: Example: Parallelizing Rank-1 Update Up: Example: Parallelizing Matrix-Vector Multiplication Previous: Simple implementation

General implementation

The above algorithm generalizes in a straight-forward manner to , where x and y can have any valid vector distribution, including projected and/or duplicated. Some care must be taken in creating xdup and ydup. Notice that xdup must be aligned with the columns of a (distributed like a row of A , but duplicated), while ydup must be aligned with the rows of a (distributed like a column of A , but duplicated). Since a, x, and y were all nicely aligned before, this was not an issue. Creating xdup and ydup is now accomplished through the calls

PLA_Pvector_create_conf_to( a, PLA_PROJ_ONTO_ROW, PLA_ALL_ROWS, &xdup );
PLA_Pvector_create_conf_to( a, PLA_PROJ_ONTO_COL, PLA_ALL_COLS, &ydup );

After this, all required communication and alignment is hidden in the PLA_Copy and PLA_Reduce routines. A code that generalizes even further, implementing the full functionality of the sequential

gemv operation in given in Figure

PLACE BEGIN HR HERE

PLACE END HR HERE

Next: Example: Parallelizing Rank-1 Update Up: Example: Parallelizing Matrix-Vector Multiplication Previous: Simple implementation

rvdg@cs.utexas.edu