SUMMA: Scalable Universal Matrix Multiplication Algorithm

Robert A. van de Geijn
Department of Computer Sciences
University of Texas
Austin, TX 78712

Jerrell Watts
Scalable Concurrent Programming Laboratory
California Institute of Technology
Pasadena, California 91125


We give a straight forward, highly efficient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system.