You can do this assignment with another student in the
course. If you do,
make sure you put both names on your report. Only one student
needs to submit on Canvas.
Late submission policy: Submission can be at the most 2
days late. There will be a 10% penalty for each day after the
due date (cumulative).
Clarifications to the assignment will be posted on Piazza.
this assignment, you will implement a sequential program in
C++ for the page-rank problem. In later assignments, you will
implement parallel algorithms for page-rank and other graph
problems. Read the entire assignment before starting your
coding. You may use library routines from the STL and boost
Graph formats
We will provide three files with the
following graphs: (i) power-law
graph rmat15, (ii) road network road-NY (New York road network) and (iii) the
Wikipedia graph discussed in lecture. Graphs will be given to
you in DIMACS format, which is described at the end of
this assignment.
rmat15.dimacs road-NY.dimacs wiki.dimacs
Node degree histograms
(90 points) Submit (in canvas), as a .tar /
.tar.gz archive, your code and all the items listed in the
experiments above. Inside the archive, also submit a makefile so that the code can be
compiled with make [PARAMETER]. Describe how to compile
and run the program in a README.txt. Separately from the
archive, submit a .pdf with the experimental results.
(10 points) In lecture, I mentioned that the
page-rank algorithm computes the solution to a system of
linear equations in which the unknowns are the page-ranks of
each node and in which there is one equation for each node
that defines the page-rank of that node in terms of the
page-ranks of its in-neighbors. Demonstrate this with the
graph from Wikipedia used in lecture, as follows.
Write down the system of linear equations for
the example.
Using MATLAB or any other system, compute the
solution to this system of equations.
Does your solution match the page-ranks shown
in the diagram (you may need to scale all your computed page-ranks so their sum is one)?
in the answers to each of these questions.
DIMACS format for graphs
One popular format for representing directed graphs as text files is the DIMACS format (undirected graphs are represented as a directed graph by representing each undirected edge as two directed edges). Files are assumed to be well-formed and internally consistent so it is not necessary to do any error checking. A line in a file must be one of the following.
c This is an example of a comment line.
lower-case character p signifies that this is the
problem line. The FORMAT field should contain a mnemonic
for the problem such as sssp. The
field contains an integer value specifying n, the
number of nodes in the graph. The EDGES field contains an integer value
specifying m, the number of edges in the graph.
a s d w
lower-case character "a" signifies that this is an edge
descriptor line. The "a" stands for arc, in case you are
wondering. Edges may occur in any order in the file. For
graphs with unweighted edges, we will use an arbitrary edge
weight like 1.
for rmat graphs: Special care is needed when
reading in rmat graphs. Because
of the generator used for rmat
graphs, the files for some rmat
graphs may have multiple edges between the same pair of nodes,
violating the DIMACS spec. When building the CSR
representation in memory, keep only the edge with the largest
weight. For example, if you find edges (s d 1) and (s d 4) for
example from source s to destination d, keep only the edge
with weight 4. In principle, you can keep the smallest weight
edge or follow some other rule, but I want everyone to follow
the same rule to make grading easier.
Hints for constructing CSR
format graphs from DIMACS files
Nodes are numbered starting from 1 in DIMACS format but C++ arrays start at 0. To
keep things simple and to make grading easier, your data
structures and code should ignore node position 0 in your
To construct CSR representation of graphs, you can use the
following steps: