Unit 3.3.6 Micro-kernel with packed data
¶
How to modify the five loops to incorporate packing was discussed in Unit 3.3.5. A micro-kernel to compute with the packed data when \(m_R \times n_R = 4 \times 4 \) is now illustrated in Figure 3.3.8.
Homework 3.3.6.1.
Examine the files Assignments/Week3/C/Gemm_Five_Loops_Packed_MRxNRKernel.c
and Assignments/Week3/C/Gemm_4x4Kernel_Packed.c
. Collect performance data with
make Five_Loops_Packed_4x4Kerneland view the resulting performance with Live Script Plot_Five_Loops.mlx.
On Robert's laptop:
Homework 3.3.6.2.
Copy the file Gemm_4x4Kernel_Packed.c into file Gemm_12x4Kernel_Packed.c. Modify that file so that it uses \(m_R \times n_R = 12 \times 4 \text{.}\) Test the result with
make Five_Loops_Packed_12x4Kerneland view the resulting performance with Live Script Plot_Five_Loops.mlx.
Assignments/Week3/Answers/Gemm_12x4Kernel_Packed.c
On Robert's laptop:
Now we are getting somewhere!Homework 3.3.6.3.
In Homework 3.2.3.1, you determined the best block sizes MC and KC. Now that you have added packing to the implementation of the five loops around the micro-kernel, these parameters need to be revisited. You can collect data for a range of choices by executing
make Five_Loops_Packed_?x?Kernel_MCxKCwhere ?x? is your favorite choice for register blocking. View the result with data/Plot_Five_loops.mlx.