Number of times this page has been accessed since Oct. 11, 1995:

 
Parallel Implementation of  BLAS: 
General Techniques for Level 3 BLAS
-  Almadena Chtchelkanova 
-  John Gunnels 
-  Greg Morrow 
-  James Overfelt 
-  Robert A. van de Geijn 
-  University of Texas at Austin
-  Austin, TX 78712
 Abstract 
In this paper, we present straight forward
techniques for a highly efficient,
scalable implementation of common matrix-matrix
operations generally known as the Level 3 Basic Linear
Algebra Subprograms (BLAS).
This work builds on our recent
discovery of a parallel matrix-matrix multiplication
implementation, which has 
yielded superior performance, and
requires little work space. 
We show that the techniques used for
the matrix-matrix multiplication naturally extend to 
all important level 3 BLAS and thus this approach
becomes an enabling technology for efficient parallel
implementation
of these routines and libraries that use BLAS.
Representative performance results
on the Intel Paragon system are given.
Almadena Chtchelkanova, John Gunnels, Greg Morrow,
James Overfelt,
Robert A. van de Geijn,
"Parallel Implementation of  BLAS:
General Techniques for Level 3 BLAS," 
TR-95-40, Department of Computer Sciences, University of
Texas, Oct. 1995. Submitted to Concurrency: Practice and Experience.