Subsection 5.5.2 Blocked LU factorization
ΒΆRecall from Subsection 3.3.4 that casting computation in terms of matrix-matrix multiplication facilitates high performance. In this unit we very briefly illustrate how the right-looking LU factorization can be reformulated as such a "blocked" algorithm. For details on other blocked LU factorization algorithms and blocked Cholesky factorization algorithms, we once again refer the interested reader to our Massive Open Online Course titled "LAFF-On Programming for Correctness" [29]. We will revisit these kinds of issues in the final week of this course. Consider A = L U and partition these matrices as-
Compute the LU factorization of A_{11} (e.g., using any of the "unblocked' algorithms from Subsection 5.5.1).
\begin{equation*} A_{11} = L_{11} U_{11}, \end{equation*}overwriting A_{11} with the factors.
-
Solve
\begin{equation*} L_{11} U_{12} = A_{12} \end{equation*}for U_{12} \text{,} overwriting A_{12} with the result. This is known as a "triangular solve with multple right-hand sides." This comes from the fact that solving
\begin{equation*} L X = B, \end{equation*}where L is lower triangular, can be reformulated by partitioning X and B by columns,
\begin{equation*} \begin{array}[t]{c} \underbrace{ L \left( \begin{array}{c | c | c} x_0 \amp x_1 \amp \cdots \end{array} \right) } \\ \left( \begin{array}{c | c | c} L x_0 \amp L x_1 \amp \cdots \end{array} \right) \end{array} = \left( \begin{array}{c | c | c} b_0 \amp b_1 \amp \cdots \end{array} \right) , \end{equation*}which exposes that for each pair of columns we must solve the unit lower triangular system L x_j = b_j \text{.}
-
Solve
\begin{equation*} L_{21} U_{11} = A_{21} \end{equation*}for L_{21} \text{,} overwriting A_{21} with the result. This is also a "triangular solve with multple right-hand sides" since we can instead view it as solving the lower triangular system with multiple right-hand sides
\begin{equation*} U_{11}^T L_{21}^T = A_{21}^T. \end{equation*}(In practice, the matrices are not transposed.)
-
Update
\begin{equation*} A_{22} := A_{22} - L_{21} U_{12}. \end{equation*} Proceed by computing the LU factorization of the updated A_{22} \text{.}
Remark 5.5.2.2.
For each (unblocked) algorithm in Subsection 5.5.1, there is a corresponding blocked algorithm.