Unit 3.3.4 Implementation: packing block \(A_{i,p} \)
¶
We next discuss the packing of the block \(A_{i,p} \) into \(\widetilde A_{i,p} \text{:}\)
We break the implementation, inAssignments/Week3/C/PackA.c
, down into two routines. The first loops over all the rows that need to be packed as illustrated in Figure 3.3.5.
That routine then calls a routine that packs the panel
Given in Figure 3.3.6.Remark 3.3.7.
Again, these routines only work when the sizes are "nice". We leave it as a challenge to generalize all implementations so that matrix-matrix multiplication with arbitrary problem sizes works. To manage the complexity of this, we recommend "padding" the matrices with zeroes as they are being packed. This then keeps the micro-kernel simple.