Unit 3.3.3 Implementation: packing row panel \(B_{p,j} \)
¶
We briefly discuss the packing of the row panel \(B_{p,j} \) into \(\widetilde B_{p,j} \text{:}\)
We break the implementation, inAssignments/Week3/C/PackB.c
, down into two routines. The first loops over all the panels that need to be packed as illustrated in Figure 3.3.2.
That routine then calls a routine that packs the panel
Given in Figure 3.3.3.Remark 3.3.4.
We emphasize that this is a “quick and dirty” implementation. It is meant to be used for matrices that are sized to be nice multiples of the various blocking parameters. The goal of this course is to study how to implement matrix-matrix multiplication for matrices that are nice multiples of these blocking sizes. Once one fully understands how to optimize that case, one can start from scratch and design an implementation that works for all matrix sizes. Alternatively, upon completing this week one understands the issues well enough to be able to study a high-quality, open source implementation like the one in our BLIS framework [3].