So that in future iterations \(L_{22} z_2 = y_2 \) (updated!) will be solved, updating \(z_2 \text{.}\)
These insights justify the algorithm in Figure 5.3.5.1, which overwrites \(y \) with the solution to \(L z = y \text{.}\)
Homework5.3.5.1.
Derive a similar algorithm for solving \(U x = z \text{.}\) Update the below skeleton algorithm with the result. (Don't forget to put in the lines that indicate how you "partition and repartition" through the matrix.)
So, \(\chi_1 = \zeta_1 / \upsilon_{11} \) after which \(x_0 \) can be computed by solving \(U_{00} x_0 = z_0 - \chi_1 u_{01} \text{.}\) The resulting algorithm is then given by
\begin{equation*}
\newcommand{\routinename}{\mbox{Solve } U x = z,
\mbox{ overwriting } z \mbox{ with } x \mbox{ (Variant 1)}}
\newcommand{\guard}{
n( U_{BR} ) \lt n( U )
}
\newcommand{\partitionings}{
U \rightarrow
\FlaTwoByTwo{U_{TL}}{U_{TR}}
{U_{BL}}{U_{BR}},
z \rightarrow
\FlaTwoByOne{z_T}{z_B}
}
\newcommand{\partitionsizes}{
U_{BR} {\rm ~is~} 0 \times 0 \mbox{ and }
z_B \mbox{ has } 0 \mbox{ elements}
}
\newcommand{\repartitionings}{
\FlaTwoByTwo{U_{TL}}{U_{TR}}
{U_{BL}}{U_{BR}}
\rightarrow
\FlaThreeByThreeTL{U_{00}}{u_{01}}{U_{02}}
{u_{10}^T}{\upsilon_{11}}{u_{12}^T}
{U_{20}}{u_{21}}{U_{22}},
\FlaTwoByOne{z_T}{z_B}
\rightarrow
\FlaThreeByOneT{z_0}{\zeta_1}{z_2}
}
\newcommand{\moveboundaries}{
\FlaTwoByTwo{U_{TL}}{U_{TR}}
{U_{BL}}{U_{BR}}
\leftarrow
\FlaThreeByThreeBR{U_{00}}{u_{01}}{U_{02}}
{u_{10}^T}{\upsilon_{11}}{u_{12}^T}
{U_{20}}{u_{21}}{U_{22}},
\FlaTwoByOne{z_T}{z_B}
\leftarrow
\FlaThreeByOneB{z_0}{\zeta_1}{z_2}
}
\newcommand{\update}{
\begin{array}{l}
\zeta_1 := \zeta_1 / \upsilon_{11} \\
z_0 := z_0 - \zeta_1 u_{01}
\end{array}
}
\FlaAlgorithm
\end{equation*}
The idea now is as follows: Assume that the elements of \(z_0 \) were computed in previous iterations in the algorithm in Figure 5.3.5.2, overwriting \(y_0 \text{.}\) Then in the current iteration we must compute \(\zeta_1 := \psi_1 - l_{10}^T z_0 \text{,}\) overwriting \(\psi_1 \text{.}\)
Homework5.3.5.2.
Derive a similar algorithm for solving \(U x = z \text{.}\) Update the below skeleton algorithm with the result. (Don't forget to put in the lines that indicate how you "partition and repartition" through the matrix.)
which can then overwrite \(\zeta_1 \text{.}\) The resulting algorithm is given by
\begin{equation*}
\newcommand{\routinename}{\mbox{Solve } U x = z,
\mbox{ overwriting } z \mbox{ with } x \mbox{ (Variant 2)}}
\newcommand{\guard}{
n( U_{BR} ) \lt n( U )
}
\newcommand{\partitionings}{
U \rightarrow
\FlaTwoByTwo{U_{TL}}{U_{TR}}
{U_{BL}}{U_{BR}},
z \rightarrow
\FlaTwoByOne{z_T}{z_B}
}
\newcommand{\partitionsizes}{
U_{BR} {\rm ~is~} 0 \times 0 \mbox{ and }
z_B \mbox{ has } 0 \mbox{ elements}
}
\newcommand{\repartitionings}{
\FlaTwoByTwo{U_{TL}}{U_{TR}}
{U_{BL}}{U_{BR}}
\rightarrow
\FlaThreeByThreeTL{U_{00}}{u_{01}}{U_{02}}
{u_{10}^T}{\upsilon_{11}}{u_{12}^T}
{U_{20}}{u_{21}}{U_{22}},
\FlaTwoByOne{z_T}{z_B}
\rightarrow
\FlaThreeByOneT{z_0}{\zeta_1}{z_2}
}
\newcommand{\moveboundaries}{
\FlaTwoByTwo{U_{TL}}{U_{TR}}
{U_{BL}}{U_{BR}}
\leftarrow
\FlaThreeByThreeBR{U_{00}}{u_{01}}{U_{02}}
{u_{10}^T}{\upsilon_{11}}{u_{12}^T}
{U_{20}}{u_{21}}{U_{22}},
\FlaTwoByOne{z_T}{z_B}
\leftarrow
\FlaThreeByOneB{z_0}{\zeta_1}{z_2}
}
\newcommand{\update}{
\begin{array}{l}
\zeta_1 := \zeta_1 - u_{12}^T z_2 \\
\zeta_1 := \zeta_1 / \upsilon_{11}
\end{array}
}
\FlaAlgorithm
\end{equation*}
Homework5.3.5.3.
Let \(L \) be an \(m \times m \) unit lower triangular matrix. If a multiply and add each require one flop, what is the approximate cost of solving \(L x = y \text{?}\)
Let \(L_{00} \) be \(k \times k \) in a typical iteration. Then \(y_2 \) is of size \(m - k - 1 \) and \(y_2 := y_2 - \psi_1 l_{21} \) requires \(2( m - k -
1 ) \) flops. Summing this over all iterations requires
Computation tends to be more efficient when matrices are accessed by column, since in scientific computing applications tend to store matrices by columns (in column-major order). This dates back to the days when Fortran ruled supreme. Accessing memory consecutively improves performance, so computing with columns tends to be more efficient than computing with rows.
Variant 1 for each of the algorithms casts computation in terms of columns of the matrix that is involved;