SELECTED PLAPACK IMPLEMENTATIONS OF MATRIX OPERATIONS
Index
-
Parallel Level-2 BLAS:
- PLA_Gemv: Parallel General Matrix-Matrix Multiplication
-
Parallel Level-3 BLAS:
- PLA_Gemm: Parallel General Matrix-Vector Multiplication
-
Parallel Factorization Routines:
- PLA_Chol: Parallel Cholesky Factorization
PLA_Gemv: General Matrix-Vector Multiplication
The best way to justify the Abstract Programming Interface used by
PLAPACK is to look at how a parallel implementation code looks
if it is coded in a more traditional fashion.
Look at the corresponding
ScaLAPACK code:
-
pdgemv_.c: ScaLAPACK Parallel BLAS (PBLAS) routine
-
pbdgemv.f:
ScaLAPACK Parallel Blocked BLAS (PBBLAS) routine
References:
PLA_Gemm: General Matrix-Matrix Multiplication
-
Main routine:
PLA_Gemm.c
This routine chooses between three different routines, depending
on the shapes of the matrices involved:
- If matrix C contains most data, it is left in place and
A and B are communicated. The algorithm is implemented
as a sequence of rank-k updates.
- If matrix A contains most data, it is left in place and
B and C are communicated. The algorithm is implemented
as a sequence of matrix-panel( of columns) multiplies.
- If matrix B contains most data, it is left in place and
A and C are communicated. The algorithm is implemented
as a sequence of panel( of rows)-matrix multiplies.
-
Parameter checking:
PLA_Gemm_enter_exit.c
The best way to justify the Abstract Programming Interface used by
PLAPACK is to look at how a parallel implementation code looks
if it is coded in a more traditional fashion.
Look at the corresponding
ScaLAPACK code:
-
pdgemm_.c: ScaLAPACK Parallel BLAS (PBLAS) routine
-
pbdgemm.f:
ScaLAPACK Parallel Blocked BLAS (PBBLAS) routine
References:
- R. van de Geijn,
Using PLAPACK (Users' Guide) , The MIT Press, 1997.
- Robert van de Geijn and Jerrell Watts
"SUMMA: Scalable Universal Matrix Multiplication Algorithm,"
Concurrency: Practice and Experience, Vol. 9 (4), pp. 255-274
(April 1997)
- John Gunnels, Calvin Lin, Greg Morrow, and Robert van de Geijn,
"A Flexible Class of Parallel Matrix Multiplication Algorithms"
, Proceedings of First Merged International Parallel Processing
Symposium and Symposium on Parallel and Distributed Processing (1998 IPPS/SPDP
'98), pp. 110-116 1998.
PLA_Chol: Cholesky Factorization
-
Main routine:
PLA_Chol.c
-
Parameter checking:
PLA_Chol_enter_exit.c
-
A much simpler implementation, which really shows off how a PLAPACK
implementation is just a direct translation of how an algorithm is naturally
expressed, is given by
The best way to justify the Abstract Programming Interface used by
PLAPACK is to look at how a parallel implementation code looks
if it is coded in a more traditional fashion.
Look at the corresponding
ScaLAPACK code:
-
pdpotrf.c: ScaLAPACK Blocked Cholesky Factorization
-
pdpotf2.c: ScaLAPACK Unblocked Cholesky Factorization (needed by blocked factorization)
References:
- R. van de Geijn,
Using PLAPACK (Users' Guide) , The MIT Press, 1997.
- Greg Morrow and Robert van de Geijn,
"Zen and the Art of High-Performance Parallel Computing"
- Greg Baker, John Gunnels, Greg Morrow, Beatrice Riviere, and Robert
van de Geijn,
"PLAPACK: High Performance through High Level Abstraction" ,
ICPP98.
- Philip Alpatov, Greg Baker, Carter Edwards, John Gunnels, Greg Morrow,
James Overfelt, Robert van de Geijn, Yuan-Jye J. Wu,
"PLAPACK: Parallel Linear Algebra Libraries Design Overview"
, SC97.
- Philip Alpatov, Greg Baker, Carter Edwards, John Gunnels, Greg Morrow,
James Overfelt, Robert van de Geijn, Yuan-Jye J. Wu, "PLAPACK:
Parallel Linear Algebra Package," in Proceedings of the SIAM
Parallel Processing Conference, 1997.
Back to PLAPACK page
Send mail to
plapack@cs.utexas.edu
Last Updated: Feb. 8, 2000