Unit 3.6.1 Additional exercises
¶The following are pretty challenging exercises. They stretch you to really think through the details. Or you may want to move on to Week 4, and return to these if you have extra energy in the end.
Homework 3.6.1.1.
You will have noticed that we only time certain problem sizes when we execute our performance experiments. The reason is that the implementations of MyGemm do not handle "edge cases" well: when the problem size is not a nice multiple of the blocking sizes that appear in the micro-kernel: \(m_R \text{,}\) \(n_R \text{,}\) and \(k_C \text{.}\)
SInce our final, highest-performing implementations pack panels of \(B\) and blocks of \(A\text{,}\) much of the edge case problems can be handled by "padding" micro-panels with zeroes when the remaining micro-panel is not a "full" one. Also, one can compute a full micro-tile of \(C\) and then add it to a partial such submatrix of \(C\text{,}\) if a micro-tile that is encountered is not a full \(m_R \times n_R \) submatrix.
Reimplement MyGemm using these tricks so that it computes correctly for all sizes \(m\text{,}\) \(n\text{,}\) and \(k\text{.}\)
Homework 3.6.1.2.
Implement the practical Strassen algorithm discussed in Unit 3.5.4.