Hochan Lee (hochan@utexas.edu), Roshan Dathathri (roshan@cs.utexas.edu)
GPUs have become a popular platform for improving the performance of many applications. However, they have limited memory on the device. To address this, modern GPU architectures introduce Unified Memory (UM) [1] [2] [3]. UM automatically chooses and allocates either CPU or GPU memory, but can be accessed by both the CPU and the GPU. Therefore, using UM, GPUs have access to a much larger memory pool.
Dynamic data structures like hash tables are expressed quite easily in languages like C++ for host CPUs because CPUs support efficient dynamic memory allocation. There are also libraries like Galois that provide concurrent and scalable memory allocators. These works use a memory pool per thread and manage memory by always providing (padded) allocations in a fixed chunk size (for example, powers of 2) [4]. On the other hand, there are no efficient dynamic memory allocators for a GPU. In this project, you will build a dynamic memory allocator for a GPU using UM. This includes surveying the literature for existing solutions of memory managements for CPU or GPU [5][6]. The goal should be to implement a dynamic data structure like a dynamic hash table using the allocator. There is some recent work [7] that implements a dynamic hash table but it is limited and does not support UM. Your implementation should be more general and more efficient than that solution.
Bridges cluster P100 nodes: 1 machines each with 2 NVIDIA P100 GPUs.
Internal Machine located in UT called Tuxedo: 1 machine with 4 K80 GPUs and 2 GTX 1080 GPUs.
(Nov 6) A clear description of your planned project and brief understanding of the exsiting memory allocator.
(Nov 13) A surveying and understanding of GPU memory technologies including performance aspects
(Dec 6) An implementation of the GPU memory allocator
(Dec 6) A project report, written like an ACM conference paper, that summarizes your work.
[1] Unified Memory Programming (link)
[2] Everything you need to know about unified memory (link)
[3] Memory Management on Modern GPU Architectures (link)
[4] Galois manual (link)
[5] Performance Evaluation of Data Migration Methods Between the Host and the Device in CUDA-Based Programming (link)
[6] Overlapping Host-to-Device Copy and Computation using Hidden Unified Memory (link)
[7] A dynamic hash table for the GPU, IPDPS 2018 (link)