On the Efficiency of Global Combine Algorithms for 2-D Meshes With Wormhole Routing

M. Barnett
Computer Science Department
University of Idaho
Moscow, Idaho 83843
mbarnett@cs.uidaho.edu
R. Littlefield
Pacific Northwest Laboratory
P.O. Box 999
Richland, Washington 99352
rj_littlefield@pnl.gov
D. Payne
Intel SSD
Intel Corporation
15201 N.W. Greenbrier Pkwy
Beaverton, Oregon 97006
payne@ssd.intel.com
Robert A. van de Geijn
Department of Computer Sciences
University of Texas
Austin, TX 78712
rvdg@cs.utexas.edu

Abstract

The problem of performing a global combine (summation) operation on distributed memory computers using a two-dimensional mesh interconnect with wormhole routing is considered. We present algorithms that are asymptotically optimal for short vectors (O(\log(p)) for p processing nodes) and for long vectors (O(n) for n data elements per node), as well as hybrid algorithms that are superior for intermediate n . The algorithms are analyzed using detailed performance models that include the effects of link conflicts and other characteristics of the underlying communication system. The models are validated using experimental data from the Intel Touchstone DELTA computer. We show that no one algorithm is optimal for all vector lengths; rather, each of the presented algorithms is superior under some circumstances.

M. Barnett, R. Littlefield, D. Payne, and R. van de Geijn, ``On the Efficiency of Global Combine Algorithms for 2-D Meshes With Wormhole Routing,'' Journal of Parallel and Distributed Computing 24, pp. 191-201 (1995).

M. Barnett, R. Littlefield, D. Payne, and R. van de Geijn, `On the Efficiency of Global Combine Algorithms for 2-D Meshes With Wormhole Routing,'' TR-93-05, Department of Computer Sciences , University of Texas, March 1993.