Skip to main content

Subsection 5.3.5 Solving with a triangular matrix

We are left to discuss how to solve Lz=yLz=y and Ux=z.Ux=z.

Subsubsection 5.3.5.1 Algorithmic Variant 1

fit width

Consider Lz=yLz=y where LL is unit lower triangular. Partition

L(10l21L22),z(ζ1z2)andy(ψ1y2).

Then

(10l21L22)L(ζ1z2)z=(ψ1y2)y.

Multiplying out the left-hand side yields

(ζ1ζ1l21+L22z2)=(ψ1y2)

and the equalities

ζ1=ψ1ζ1l21+L22z2=y2,

which can be rearranged as

ζ1=ψ1L22z2=y2ζ1l21.

We conclude that in the current iteration

  • ψ1 needs not be updated.

  • y2:=y2ψ1l21

So that in future iterations L22z2=y2 (updated!) will be solved, updating z2.

These insights justify the algorithm in Figure 5.3.5.1, which overwrites y with the solution to Lz=y.

Solve Lz=y, overwriting y with z (Variant 1)L(LTLLTRLBLLBR),y(yTyB)   LTL is 0×0 and yT has 0 elementswhile n(LTL)<n(L)   (LTLLTRLBLLBR)(L00l01L02lT10λ11lT12L20l21L22),(yTyB)(y0ψ1y2)   y2:=y2ψ1l21   (LTLLTRLBLLBR)(L00l01L02lT10λ11lT12L20l21L22),(yTyB)(y0ψ1y2)endwhile
Figure 5.3.5.1. Lower triangular solve (with unit lower triangular matrix), Variant 1
Homework 5.3.5.1.

Derive a similar algorithm for solving Ux=z. Update the below skeleton algorithm with the result. (Don't forget to put in the lines that indicate how you "partition and repartition" through the matrix.)

Solve Ux=z, overwriting z with x (Variant 1)U(UTLUTRUBLUBR),z(zTzB)   UBR is 0×0 and zB has 0 elementswhile n(UBR)<n(U)   (UTLUTRUBLUBR)(U00u01U02uT10υ11uT12U20u21U22),(zTzB)(z0ζ1z2)                                                                                                            (UTLUTRUBLUBR)(U00u01U02uT10υ11uT12U20u21U22),(zTzB)(z0ζ1z2)endwhile

Hint: Partition

(U00u010υ11)(x0χ1)=(z0ζ1).
Solution

Multiplying this out yields

\begin{equation*} \FlaTwoByOneSingleLine{ U_{00} x_0 + u_{01} \chi_1 }{\upsilon_{11} \chi_1 } = \FlaTwoByOneSingleLine{ z_0 }{ \zeta_1 }. \end{equation*}

So, \(\chi_1 = \zeta_1 / \upsilon_{11} \) after which \(x_0 \) can be computed by solving \(U_{00} x_0 = z_0 - \chi_1 u_{01} \text{.}\) The resulting algorithm is then given by

\begin{equation*} \newcommand{\routinename}{\mbox{Solve } U x = z, \mbox{ overwriting } z \mbox{ with } x \mbox{ (Variant 1)}} \newcommand{\guard}{ n( U_{BR} ) \lt n( U ) } \newcommand{\partitionings}{ U \rightarrow \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}}, z \rightarrow \FlaTwoByOne{z_T}{z_B} } \newcommand{\partitionsizes}{ U_{BR} {\rm ~is~} 0 \times 0 \mbox{ and } z_B \mbox{ has } 0 \mbox{ elements} } \newcommand{\repartitionings}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \rightarrow \FlaThreeByThreeTL{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}}, \FlaTwoByOne{z_T}{z_B} \rightarrow \FlaThreeByOneT{z_0}{\zeta_1}{z_2} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \leftarrow \FlaThreeByThreeBR{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}}, \FlaTwoByOne{z_T}{z_B} \leftarrow \FlaThreeByOneB{z_0}{\zeta_1}{z_2} } \newcommand{\update}{ \begin{array}{l} \zeta_1 := \zeta_1 / \upsilon_{11} \\ z_0 := z_0 - \zeta_1 u_{01} \end{array} } \FlaAlgorithm \end{equation*}

Subsubsection 5.3.5.2 Algorithmic Variant 2

fit width

An alternative algorithm can be derived as follows: Partition

L(L000lT101),z(z0ζ1)andy(y0ψ1).

Then

(L000lT101)L(z0ζ1)z=(y0ψ1)y.

Multiplying out the left-hand side yields

(L00z0lT10z0+ζ1)=(y0ψ1)

and the equalities

L00z0=y0lT10z0+ζ1=ψ1.

The idea now is as follows: Assume that the elements of z0 were computed in previous iterations in the algorithm in Figure 5.3.5.2, overwriting y0. Then in the current iteration we must compute ζ1:=ψ1lT10z0, overwriting ψ1.

Solve Lz=y, overwriting y with z (Variant 2)L(LTLLTRLBLLBR),y(yTyB)   LTL is 0×0 and yT has 0 elementswhile n(LTL)<n(L)   (LTLLTRLBLLBR)(L00l01L02lT10λ11lT12L20l21L22),(yTyB)(y0ψ1y2)   ψ1:=ψ1lT10y0   (LTLLTRLBLLBR)(L00l01L02lT10λ11lT12L20l21L22),(yTyB)(y0ψ1y2)endwhile
Figure 5.3.5.2. Lower triangular solve (with unit lower triangular matrix), Variant 2
Homework 5.3.5.2.

Derive a similar algorithm for solving Ux=z. Update the below skeleton algorithm with the result. (Don't forget to put in the lines that indicate how you "partition and repartition" through the matrix.)

Solve Ux=z, overwriting z with x (Variant 2)U(UTLUTRUBLUBR),z(zTzB)   UBR is 0×0 and zB has 0 elementswhile n(UBR)<n(U)   (UTLUTRUBLUBR)(U00u01U02uT10υ11uT12U20u21U22),(zTzB)(z0ζ1z2)                                                                                                            (UTLUTRUBLUBR)(U00u01U02uT10υ11uT12U20u21U22),(zTzB)(z0ζ1z2)endwhile

Hint: Partition

U(υ11uT120U22).
Solution

Partition

\begin{equation*} \FlaTwoByTwoSingleLine{ \upsilon_{11} }{ u_{12}^T }{0}{U_{22}} \FlaTwoByOneSingleLine{ \chi_1 }{ x_2 } = \FlaTwoByOneSingleLine{ \zeta_1 }{ z_2 }. \end{equation*}

Multiplying this out yields

\begin{equation*} \FlaTwoByOneSingleLine{ \upsilon_{11} \chi_1 + u_{12}^T x_2}{ U_{22} x_2 } = \FlaTwoByOneSingleLine{ \zeta_1 }{ z_2 }. \end{equation*}

So, if we assume that \(x_2 \) has already been computed and has overwritten \(z_2 \text{,}\) then \(\chi_1 \) can be computed as

\begin{equation*} \chi_1 = ( \zeta_1 - u_{12}^T x_2 ) / \upsilon_{11} \end{equation*}

which can then overwrite \(\zeta_1 \text{.}\) The resulting algorithm is given by

\begin{equation*} \newcommand{\routinename}{\mbox{Solve } U x = z, \mbox{ overwriting } z \mbox{ with } x \mbox{ (Variant 2)}} \newcommand{\guard}{ n( U_{BR} ) \lt n( U ) } \newcommand{\partitionings}{ U \rightarrow \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}}, z \rightarrow \FlaTwoByOne{z_T}{z_B} } \newcommand{\partitionsizes}{ U_{BR} {\rm ~is~} 0 \times 0 \mbox{ and } z_B \mbox{ has } 0 \mbox{ elements} } \newcommand{\repartitionings}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \rightarrow \FlaThreeByThreeTL{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}}, \FlaTwoByOne{z_T}{z_B} \rightarrow \FlaThreeByOneT{z_0}{\zeta_1}{z_2} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \leftarrow \FlaThreeByThreeBR{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}}, \FlaTwoByOne{z_T}{z_B} \leftarrow \FlaThreeByOneB{z_0}{\zeta_1}{z_2} } \newcommand{\update}{ \begin{array}{l} \zeta_1 := \zeta_1 - u_{12}^T z_2 \\ \zeta_1 := \zeta_1 / \upsilon_{11} \end{array} } \FlaAlgorithm \end{equation*}
Homework 5.3.5.3.

Let L be an m×m unit lower triangular matrix. If a multiply and add each require one flop, what is the approximate cost of solving Lx=y?

Solution

Let us analyze Variant 1.

Let \(L_{00} \) be \(k \times k \) in a typical iteration. Then \(y_2 \) is of size \(m - k - 1 \) and \(y_2 := y_2 - \psi_1 l_{21} \) requires \(2( m - k - 1 ) \) flops. Summing this over all iterations requires

\begin{equation*} \sum_{k=0}^{m-1} [ 2( m-k-1 ) ] \mbox{ flops}. \end{equation*}

The change of variables \(j = m - k - 1 \) yields

\begin{equation*} \sum_{k=0}^{m-1} [ 2( m-k-1 ) ] = 2 \sum_{j=0}^{m-1} j \approx m^2. \end{equation*}

Thus, the cost is approximately \(m^2 \) flops.

Subsubsection 5.3.5.3 Discussion

Computation tends to be more efficient when matrices are accessed by column, since in scientific computing applications tend to store matrices by columns (in column-major order). This dates back to the days when Fortran ruled supreme. Accessing memory consecutively improves performance, so computing with columns tends to be more efficient than computing with rows.

Variant 1 for each of the algorithms casts computation in terms of columns of the matrix that is involved;

y2:=y2ψ1l21

and

z0:=z0ζ1u01.

These are called axpy operations:

y:=αx+y.

"alpha times x plus y." In contrast, Variant 2 casts computation in terms of rows of the matrix that is involved:

ψ1:=ψ1lT10y0

and

ζ1:=ζ1uT12z2

perform dot products.