Exploiting Modern GPU Architectural Features for Distributed Multi-GPU Graph Analytics

 
Project contacts: Vishwesh Jatala (vishwesh.jatala@austin.utexas.edu)
Roshan Dathathri (roshan@cs.utexas.edu)

Project description: GPUs have become popular platform for improving the performance of graph analytical systems. However, the real world graph are very large in size and hence they can not processed using the capacity of a single GPU memory. Hence, researchers have focused towards developing distributed multi-host multi-GPU graph analytical frameworks. D-IrGL is one such system that supports multi-host multi-GPU architectures. It uses IrGL[1] generated code to perform computation on each GPU and uses Gluon’s[2] communication optimizations for synchronization among GPUs.

Currently, D-IrGL does not use the features of modern GPU architectures to optimize communication phase. In this project, your objective is to improve the performance the D-IrGL framework by exploiting the following modern GPU architectures:

[1] Asynchronous data transfers and streams within a single GPU.
[2] Virtual memory to support large graphs on a single GPU.
[3] Inter GPU communication without CPU intervention, even among the GPUs located in a single machine and GPUs located across multiple machines. This removes overhead associated with the redundant data transfers through CPU. You can achieve this by NVLink and GPU-Direct RDMA features.

You will implement this project in D-IrGL (https://github.com/IntelligentSoftwareSystems/Galois). You will be provided with the following hardware resources.

Hardware:

Project deliverables and deadlines:

Papers:

[1] Sreepathi Pai, Keshav Pingali: A compiler for throughput optimization of graph algorithms on GPUs. OOPSLA 2016: 1-19

[2] Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, Keshav Pingali, Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics. PLDI 2018: 752-768

[3] Hao Wang, Sreeram Potluri, Miao Luo, Ashish Kumar Singh, Sayantan Sur, Dhabaleswar K. Panda: MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Computer Science - R&D 26(3-4): 257-266 (2011)

[4] Sreeram Potluri, Khaled Hamidouche, Akshay Venkatesh, Devendar Bureddy, Dhabaleswar K. Panda: Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs. ICPP 2013: 80-89

[5] https://devblogs.nvidia.com/introduction-cuda-aware-mpi/