Subsection 5.5.2 Blocked LU factorization
¶Recall from Subsection 3.3.4 that casting computation in terms of matrix-matrix multiplication facilitates high performance. In this unit we very briefly illustrate how the right-looking LU factorization can be reformulated as such a "blocked" algorithm. For details on other blocked LU factorization algorithms and blocked Cholesky factorization algorithms, we once again refer the interested reader to our Massive Open Online Course titled "LAFF-On Programming for Correctness" [29]. We will revisit these kinds of issues in the final week of this course.
Consider \(A = L U \) and partition these matrices as
where \(A_{11}\text{,}\) \(L_{11} \text{,}\) and \(U_{11} \) are \(b \times b \) submatrices. Then
From this we conclude that
This suggests the following steps:
-
Compute the LU factorization of \(A_{11} \) (e.g., using any of the "unblocked' algorithms from Subsection 5.5.1).
\begin{equation*} A_{11} = L_{11} U_{11}, \end{equation*}overwriting \(A_{11} \) with the factors.
-
Solve
\begin{equation*} L_{11} U_{12} = A_{12} \end{equation*}for \(U_{12} \text{,}\) overwriting \(A_{12} \) with the result. This is known as a "triangular solve with multple right-hand sides." This comes from the fact that solving
\begin{equation*} L X = B, \end{equation*}where \(L \) is lower triangular, can be reformulated by partitioning \(X \) and \(B \) by columns,
\begin{equation*} \begin{array}[t]{c} \underbrace{ L \left( \begin{array}{c | c | c} x_0 \amp x_1 \amp \cdots \end{array} \right) } \\ \left( \begin{array}{c | c | c} L x_0 \amp L x_1 \amp \cdots \end{array} \right) \end{array} = \left( \begin{array}{c | c | c} b_0 \amp b_1 \amp \cdots \end{array} \right) , \end{equation*}which exposes that for each pair of columns we must solve the unit lower triangular system \(L x_j = b_j \text{.}\)
-
Solve
\begin{equation*} L_{21} U_{11} = A_{21} \end{equation*}for \(L_{21} \text{,}\) overwriting \(A_{21} \) with the result. This is also a "triangular solve with multple right-hand sides" since we can instead view it as solving the lower triangular system with multiple right-hand sides
\begin{equation*} U_{11}^T L_{21}^T = A_{21}^T. \end{equation*}(In practice, the matrices are not transposed.)
-
Update
\begin{equation*} A_{22} := A_{22} - L_{21} U_{12}. \end{equation*} Proceed by computing the LU factorization of the updated \(A_{22} \text{.}\)
This motivates the algorithm in Figure 5.5.2.1.
The important observation is that if \(A \) is \(m \times m \) and \(b \) is much smaller than \(m \text{,}\) then most of the computation is in the matrix-matrix multiplication \(A_{22} := A_{22} - A_{21} A_{12} \text{.}\)
Remark 5.5.2.2.
For each (unblocked) algorithm in Subsection 5.5.1, there is a corresponding blocked algorithm.