CS 377P: Programming for Performance
Assignment 1: Performance counters
Due date: September 12, 2024,
10:00PM
Late submission policy: Submissions can be at most 1 day
late. There will be a 10% penalty for late submissions.
Description
1) Write C code for the 6 variants of matrix-matrix multiply
(MMM) you can generate by permuting loops in the standard
three-nested loop version of MMM. The data type in the matrix
should be doubles.
Hint: To check cache sizes on the
machine, run: lscpu
2) Answer the following questions, using a few sentences for each
one.
- What are data and control dependences? Give simple
examples to illustrate these concepts.
- Explain out-of-order execution and in-order
retirement/commit. Why do high-performance processors
execute instructions out of order but retire them in order?
What hardware structure(s) are used to implement in-order
retirement?
- Consider the invariants for retirement in the OOO execution
with renaming shown in slide 28 of the lecture slides. Why do
we need to check the condition "(R3.PR# = ROB[n].PR#")
before updating R3.v ? Explain what would happen if we did not
check this condition before updating R3.v?
Deliverables
Submit (in canvas) the following two files:
- A .tar.gz file with your code, a README.txt and a
Makefile.
- The README.txt describes how to run your program and what
the output will be. A reasonable output will be pairs of
"name of measured event, value".
- With the Makefile, your code should be able to be compiled
on the 5 orcrist-2* CS machines by running "make".
- A report (in .pdf) containing the tables, and the answers to
the questions in both parts.
Grading
Code: 40 points
Measurements (plots): 30 points
Explanation: 10 points
Answers to short questions in (2): 20 points