You can do this assignment alone or with someone else from class.
Each group can have a maximum of two students.
Each group should turn in one submission.
Given a directed, weighted graph and a source vertex, compute the shortest path of each vertex from the source using Bellman-Ford Algorithm. The goal of this week's assignment is to learn to use CUDA. You need to implement Single Source Shortest Path Algorithm using CUDA. You can use the graphs from the last assignment. i.e. RandomGraph and USA road network.
The main function reads a graph from a file (say, US road
network) into CPU's memory, copies the graph to GPU's memory,
calls the GPU kernel and waits for it to finish computing the
shortest paths. The final distances are then copied back from GPU
to CPU.
main { // read graph from file. // allocate memory for the graph on device. // copy graph from host to device. do { changed = false; sssp<<<...>>>(graph, distance, changed); } while (changed); // copy distance from device to host. }
You should experiment with the number of Blocks.
Number of threads per block can be in the range of 256-1024 (in
multiples of 32).
Lonestar has 8 GPU nodes, each with two NVIDIA M2090 GPUs (Fermi).
Refer to Rupesh Nasre's paper GPU_Optimizations
.
Read Section 6.1 and then the referenced subsections from Sections
3, 4, 5.
You should find Table 1 and Figure 3 useful to judge your
implementation's performance.
Submit your source and a short write up. Evaluate the
performance of your implementation
when you vary the number of blocks and the number of threads.
Kernel Unrolling mentioned in the paper is a must-have
optimization for this assignment.
(Optional)Using Shared memory as mentioned in the paper can
be considered for extra credit.(Note: If you are including this in
your submission make sure you highlight it explictly in your write
up)
Write up should include the conclusions and observations drawn
from varying the number of blocks and the number of threads.
Perform all measurements on the USA
road network and
the random
graph.