CS 377P: Programming for Performance
Assignment 1: Performance counters
Due date: February 7, 2019, 9:00PM
Late submission policy: Submissions can be at most 1 day
late. There will be a 10% penalty for late submissions.
Description
1) Write C code for the 6 variants of matrix-matrix multiply
(MMM) you can generate by permuting loops in the standard
three-nested loop version of MMM. The data type in the matrix
should be doubles.
Hint: To check cache sizes on the
machine, run: lscpu
2) Answer the following questions, using a few sentences for each
one.
- What are data and control dependences? Give simple
examples to illustrate these concepts.
- Explain out-of-order execution and in-order
retirement/commit. Why do high-performance processors
execute instructions out of order but retire them in order?
What hardware structure(s) are used to implement in-order
retirement?
- Consider the out-of-order execution processor described in
lecture. Come up with a simple rule for determining when a
physical register can be reused. You may find it useful to
study the simple example on slides 23/24 and ask yourself when
pr10 can be reused.
Deliverables
Submit (in canvas) the following two files:
- A .tar.gz file with your code, a README.txt and a
Makefile.
- The README.txt describes how to run your program and what
the output will be. A reasonable output will be pairs of
"name of measured event, value".
- With the Makefile, your code should be compiled on the 10
CS machines by running only "make".
- A report (in .pdf) containing the tables, and the answers to
the questions in both parts.
Grading
Code: 40 points
Measurements (plots): 30 points
Explanation: 10 points
Answers to short questions in (2): 20 points
PAPI:
To see which papi counters are available on a host, run:
papi_avail
To see which papi counters can be collected at the same time,
run:
papi_event_chooser
Read the PAPI manual http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:EventSets
and http://icl.cs.utk.edu/papi/docs/index.html
for more information, including example code.
"Warning! num_cntrs is more than num_mpx_cntrs" can be ignored.
ICC:
To run ICC on the indicated CS machines, run:
export PATH=$PATH:/opt/intel/bin
icc [compiler commands]
To check the availability of icc, run:
icc -v