ALAFF Blocked LU factorization

Subsection 5.5.2 Blocked LU factorization

Recall from Subsection 3.3.4 that casting computation in terms of matrix-matrix multiplication facilitates high performance. In this unit we very briefly illustrate how the right-looking LU factorization can be reformulated as such a "blocked" algorithm. For details on other blocked LU factorization algorithms and blocked Cholesky factorization algorithms, we once again refer the interested reader to our Massive Open Online Course titled "LAFF-On Programming for Correctness" [29]. We will revisit these kinds of issues in the final week of this course.

🔗

Consider

$A = L U$ and partition these matrices as

$\begin{equation*} A \rightarrow \FlaTwoByTwo{ A_{11} }{ A_{12} }{ A_{21} }{ A_{22} }, L \rightarrow \FlaTwoByTwo{ L_{11} }{ 0 }{ L_{21} }{ L_{22} }, U \rightarrow \FlaTwoByTwo{ U_{11} }{ U_{12} }{ 0 }{ U_{22} }, \end{equation*}$

🔗

where

$A_{11}\text{,}$

$L_{11} \text{,}$ and

$U_{11}$ are

$b \times b$ submatrices. Then

$\begin{equation*} \FlaTwoByTwo{ A_{11} }{ A_{12} }{ A_{21} }{ A_{22} } = \FlaTwoByTwo{ L_{11} }{ 0 }{ L_{21} }{ L_{22} } \FlaTwoByTwo{ U_{11} }{ U_{12} }{ 0 }{ U_{22} } = \FlaTwoByTwo{ L_{11} U_{11} }{ L_{11} A_{12} }{ A_{21} U_{11} }{ A_{22} - L_{21} U_{12} }. \end{equation*}$

🔗

From this we conclude that

$\begin{equation*} \begin{array}{ c | c } A_{11} = L_{11} U_{11} \amp A_{12} = L_{11} U_{12} \\ \hline A_{21} = L_{21} U_{11} \amp A_{22} - L_{21} U_{12} = L_{22} U_{22}. \end{array} \end{equation*}$

🔗

This suggests the following steps:

Compute the LU factorization of $A_{11}$ (e.g., using any of the "unblocked' algorithms from Subsection 5.5.1).

$\begin{equation*} A_{11} = L_{11} U_{11}, \end{equation*}$

overwriting $A_{11}$ with the factors.
Solve

$\begin{equation*} L_{11} U_{12} = A_{12} \end{equation*}$

for $U_{12} \text{,}$ overwriting $A_{12}$ with the result. This is known as a "triangular solve with multple right-hand sides." This comes from the fact that solving

$\begin{equation*} L X = B, \end{equation*}$

where $L$ is lower triangular, can be reformulated by partitioning $X$ and $B$ by columns,

$\begin{equation*} \begin{array}[t]{c} \underbrace{ L \left( \begin{array}{c | c | c} x_0 \amp x_1 \amp \cdots \end{array} \right) } \\ \left( \begin{array}{c | c | c} L x_0 \amp L x_1 \amp \cdots \end{array} \right) \end{array} = \left( \begin{array}{c | c | c} b_0 \amp b_1 \amp \cdots \end{array} \right) , \end{equation*}$

which exposes that for each pair of columns we must solve the unit lower triangular system $L x_j = b_j \text{.}$
Solve

$\begin{equation*} L_{21} U_{11} = A_{21} \end{equation*}$

for $L_{21} \text{,}$ overwriting $A_{21}$ with the result. This is also a "triangular solve with multple right-hand sides" since we can instead view it as solving the lower triangular system with multiple right-hand sides

$\begin{equation*} U_{11}^T L_{21}^T = A_{21}^T. \end{equation*}$

(In practice, the matrices are not transposed.)
Update

$\begin{equation*} A_{22} := A_{22} - L_{21} U_{12}. \end{equation*}$
Proceed by computing the LU factorization of the updated $A_{22} \text{.}$

🔗

This motivates the algorithm in Figure 5.5.2.1.

🔗

$\begin{equation*} \newcommand{\routinename}{A = \mbox{LU-blk-var5}( A )} \newcommand{\guard}{ n( A_{TL} ) \lt n( A ) } \newcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} } \newcommand{\partitionsizes}{ A_{TL} {\rm ~is~} 0 \times 0 } \newcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{A_{01}}{A_{02}} {A_{10}}{A_{11}}{A_{12}} {A_{20}}{A_{21}}{A_{22}} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{A_{01}}{A_{02}} {A_{10}}{A_{11}}{A_{12}} {A_{20}}{A_{21}}{A_{22}} } \newcommand{\update}{ \begin{array}{ll} A_{11} := \mbox{LU}( A_{11} ) \amp L_{11} \mbox{ and } U_{11} \mbox{ overwrite } A_{11} \\ \mbox{Solve } L_{11} U_{12} = A_{12} \amp \mbox{overwriting } A_{12} \mbox{ with } U_{12} \\ \mbox{Solve } L_{21} U_{11} = A_{21} \amp \mbox{overwriting } A_{21} \mbox{ with } L_{21} \\ A_{22} := A_{22} - A_{21} A_{12} \end{array} } \FlaAlgorithm \end{equation*}$

Figure 5.5.2.1. Blocked Variant 5 (classical Gaussian elimination) LU factorization algorithm.

🔗

The important observation is that if

$A$ is

$m \times m$ and

$b$ is much smaller than

$m \text{,}$ then most of the computation is in the matrix-matrix multiplication

$A_{22} := A_{22} - A_{21} A_{12} \text{.}$

🔗

Remark 5.5.2.2.

For each (unblocked) algorithm in Subsection 5.5.1, there is a corresponding blocked algorithm.