Loading [MathJax]/extensions/TeX/boldsymbol.js
Skip to main content

Subsection 5.5.2 Blocked LU factorization

Recall from Subsection 3.3.4 that casting computation in terms of matrix-matrix multiplication facilitates high performance. In this unit we very briefly illustrate how the right-looking LU factorization can be reformulated as such a "blocked" algorithm. For details on other blocked LU factorization algorithms and blocked Cholesky factorization algorithms, we once again refer the interested reader to our Massive Open Online Course titled "LAFF-On Programming for Correctness" [29]. We will revisit these kinds of issues in the final week of this course.

Consider A = L U and partition these matrices as

\begin{equation*} A \rightarrow \FlaTwoByTwo{ A_{11} }{ A_{12} }{ A_{21} }{ A_{22} }, L \rightarrow \FlaTwoByTwo{ L_{11} }{ 0 }{ L_{21} }{ L_{22} }, U \rightarrow \FlaTwoByTwo{ U_{11} }{ U_{12} }{ 0 }{ U_{22} }, \end{equation*}

where A_{11}\text{,} L_{11} \text{,} and U_{11} are b \times b submatrices. Then

\begin{equation*} \FlaTwoByTwo{ A_{11} }{ A_{12} }{ A_{21} }{ A_{22} } = \FlaTwoByTwo{ L_{11} }{ 0 }{ L_{21} }{ L_{22} } \FlaTwoByTwo{ U_{11} }{ U_{12} }{ 0 }{ U_{22} } = \FlaTwoByTwo{ L_{11} U_{11} }{ L_{11} A_{12} }{ A_{21} U_{11} }{ A_{22} - L_{21} U_{12} }. \end{equation*}

From this we conclude that

\begin{equation*} \begin{array}{ c | c } A_{11} = L_{11} U_{11} \amp A_{12} = L_{11} U_{12} \\ \hline A_{21} = L_{21} U_{11} \amp A_{22} - L_{21} U_{12} = L_{22} U_{22}. \end{array} \end{equation*}

This suggests the following steps:

  • Compute the LU factorization of A_{11} (e.g., using any of the "unblocked' algorithms from Subsection 5.5.1).

    \begin{equation*} A_{11} = L_{11} U_{11}, \end{equation*}

    overwriting A_{11} with the factors.

  • Solve

    \begin{equation*} L_{11} U_{12} = A_{12} \end{equation*}

    for U_{12} \text{,} overwriting A_{12} with the result. This is known as a "triangular solve with multple right-hand sides." This comes from the fact that solving

    \begin{equation*} L X = B, \end{equation*}

    where L is lower triangular, can be reformulated by partitioning X and B by columns,

    \begin{equation*} \begin{array}[t]{c} \underbrace{ L \left( \begin{array}{c | c | c} x_0 \amp x_1 \amp \cdots \end{array} \right) } \\ \left( \begin{array}{c | c | c} L x_0 \amp L x_1 \amp \cdots \end{array} \right) \end{array} = \left( \begin{array}{c | c | c} b_0 \amp b_1 \amp \cdots \end{array} \right) , \end{equation*}

    which exposes that for each pair of columns we must solve the unit lower triangular system L x_j = b_j \text{.}

  • Solve

    \begin{equation*} L_{21} U_{11} = A_{21} \end{equation*}

    for L_{21} \text{,} overwriting A_{21} with the result. This is also a "triangular solve with multple right-hand sides" since we can instead view it as solving the lower triangular system with multiple right-hand sides

    \begin{equation*} U_{11}^T L_{21}^T = A_{21}^T. \end{equation*}

    (In practice, the matrices are not transposed.)

  • Update

    \begin{equation*} A_{22} := A_{22} - L_{21} U_{12}. \end{equation*}
  • Proceed by computing the LU factorization of the updated A_{22} \text{.}

This motivates the algorithm in Figure 5.5.2.1.

\begin{equation*} \newcommand{\routinename}{A = \mbox{LU-blk-var5}( A )} \newcommand{\guard}{ n( A_{TL} ) \lt n( A ) } \newcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} } \newcommand{\partitionsizes}{ A_{TL} {\rm ~is~} 0 \times 0 } \newcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{A_{01}}{A_{02}} {A_{10}}{A_{11}}{A_{12}} {A_{20}}{A_{21}}{A_{22}} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{A_{01}}{A_{02}} {A_{10}}{A_{11}}{A_{12}} {A_{20}}{A_{21}}{A_{22}} } \newcommand{\update}{ \begin{array}{ll} A_{11} := \mbox{LU}( A_{11} ) \amp L_{11} \mbox{ and } U_{11} \mbox{ overwrite } A_{11} \\ \mbox{Solve } L_{11} U_{12} = A_{12} \amp \mbox{overwriting } A_{12} \mbox{ with } U_{12} \\ \mbox{Solve } L_{21} U_{11} = A_{21} \amp \mbox{overwriting } A_{21} \mbox{ with } L_{21} \\ A_{22} := A_{22} - A_{21} A_{12} \end{array} } \FlaAlgorithm \end{equation*}
Figure 5.5.2.1. Blocked Variant 5 (classical Gaussian elimination) LU factorization algorithm.

The important observation is that if A is m \times m and b is much smaller than m \text{,} then most of the computation is in the matrix-matrix multiplication A_{22} := A_{22} - A_{21} A_{12} \text{.}