You can do this assignment with another student in the course. Make sure you put both names on your
report.
Late submission policy: Submission can be at the most 2 days late.
There will be a 10% penalty for each day after the due date (cumulative).
Clarifications to the assignment will be posted at the bottom of the page.
In
this assignment, you will implement a sequential program in C++ for the
page-rank problem. In later assignments, you will implement parallel algorithms
for page-rank and other graph problems. Read the entire assignment before
starting your coding. You may use library routines from the STL and boost
libraries.
Graph
formats
We will
provide three files with the following graphs: (i)
power-law graph rmat15, (ii) road network road-NY (New York road
network) and (iii) the Wikipedia graph discussed in lecture. Graphs will be
given to you in DIMACS format, which is described at the end of this
assignment.
Links:
rmat15.dimacs
road-NY.dimacs wiki.dimacs
Coding
Experiments
Node degree histograms
Page-rank
1. (90 points) Submit (in canvas)
your code and all the items listed in the experiments above. Also submit
a makefile so that the code can be compiled with make
[PARAMETER]. Describe how to compile and run the program in a README.txt.
Experimental results should be submitted in PDF format.
2. (10 points) In lecture, I
mentioned that the page-rank algorithm computes the solution to a system of
linear equations in which the unknowns are the page-ranks of each node and in
which there is one equation for each node that defines the page-rank of that
node in terms of the page-ranks of its in-neighbors. Demonstrate this with the
graph from Wikipedia used in lecture, as follows.
a. Write down the system of
linear equations for the example.
b. Using MATLAB or any other
system, compute the solution to this system of equations.
c. Does your solution match the
page-ranks shown in the diagram (you may need to scale all your computed page-ranks so their sum is one)?
Turn
in the answers to each of these questions.
DIMACS format for graphs
One popular format for representing directed graphs as text files is the DIMACS format (undirected graphs are represented as a directed graph by representing each undirected edge as two directed edges). Files are assumed to be well-formed and internally consistent so it is not necessary to do any error checking. A line in a file must be one of the following.
c This is an example of a comment line.
p FORMAT NODES EDGES
The lower-case character p signifies that this is the problem line. The FORMAT field should contain a mnemonic for the problem such as sssp. The NODES field contains an integer
value specifying n, the number of nodes in the graph. The EDGES field contains an integer value specifying m, the
number of edges in the graph.
a s d w
The lower-case character "a" signifies that this is an edge descriptor line. The
"a" stands for arc, in case you are wondering. Edges may occur in any
order in the file. For graphs with unweighted edges, we will use an arbitrary
edge weight like 1.
Edges
for rmat graphs: Special care is needed when reading in rmat
graphs. Because of the generator used for rmat
graphs, the files for some rmat graphs may have
multiple edges between the same pair of nodes, violating the DIMACS spec. When
building the CSR representation in memory, keep only the edge with the largest
weight. For example, if you find edges (s d 1) and (s d 4) for example from
source s to destination d, keep only the edge with weight 4. In principle, you
can keep the smallest weight edge or follow some other rule, but I want
everyone to follow the same rule to make grading easier.
Hints for constructing CSR format graphs from DIMACS files
·
Nodes are numbered starting from 1 in DIMACS format
but C++ arrays start at 0. To keep things simple and to make grading easier,
your data structures and code should ignore node position 0 in your arrays.
·
To construct CSR representation of graphs, you can use the following steps: