We start by explaining the simple case of matrix-vector multiplication:
where A is an matrix. For our explanations, we will assume that matrix A and vectors x and y are created according to the driver given in Example 2.3. Thus, the template is created with a communicator and a blocking of . Given this template, vectors x and y are aligned with the first element of the template vector and matrix A with the upper-left element of the template matrix. This yields objects a, x, and y.
As explained in Section 1.5, the following steps will perform the matrix-vector multiply y = A x :
We will now show how to translate these operations into PLAPACK code.
- spread (collect) the entries of x within columns of nodes,
- perform the local matrix-vector multiply, and
- perform a distributed reduce (summation) of the local results within rows, leaving the global result in vector y .
The mechanism used by PLAPACK to communicate is to describe the initial and final distribution as objects, and perform a copy or reduce. Thus, the following statements will perform the spread of x within columns of nodes:
After this, all information is available locally to perform the local matrix-vector multiply. Before doing so, we need to create duplicated multiscalars to hold the constants ``0'' and ``1''. Also, a duplicated projected vector (column) must be created to hold the result:PLA_Obj_datatype( a, &datatype ); PLA_Pvector_create( datatype, PLA_PROJ_ONTO_ROW, PLA_ALL_ROWS, n, template, PLA_ALIGN_FIRST, &xdup ); PLA_Copy( x, xdup );
PLA_Mscalar_create( datatype, PLA_ALL_ROWS, PLA_ALL_COLS, 1, 1, template, &one ); PLA_Obj_set_to_one( one ); PLA_Mscalar_create( datatype, PLA_ALL_ROWS, PLA_ALL_COLS, 1, 1, template, &zero ); PLA_Obj_set_to_zero( zero ); PLA_Pvector_create( datatype, PLA_PROJ_ONTO_COL, PLA_ALL_COLS, m, template, PLA_ALIGN_FIRST, &ydup ); PLA_Local_gemv( PLA_NO_TRANSPOSE, one, a, xdup, zero, ydup );
Finally, the local results (in the different versions of the duplicated projected vector ydup) must be reduced into a single vector y :
PLA_Obj_set_to_zero( y ); PLA_Reduce( ydup, MPI_SUM, y );