ALAFF Convergence of splitting methods

Subsection 7.3.3 Convergence of splitting methods

The Jacobi and Gauss-Seidel iterations can be generalized as follows. Split matrix \(A = M - N \) where \(M \) is nonsingular. Now,

\begin{equation*} (M - N ) x = y \end{equation*}

is equivalent to

\begin{equation*} M x = N x + y \end{equation*}

and

\begin{equation*} x = M^{-1} ( N x + y ). \end{equation*}

This is an example of a fixed-point equation: Plug \(x \) into \(M^{-1} ( N x + y ) \) and the result is again \(x \text{.}\) The iteration is then created by viewing the vector on the left as the next approximation to the solution given the current approximation \(x \) on the right:

\begin{equation*} x^{(k+1)} = M^{-1} ( N x^{(k)} + y ). \end{equation*}

Let \(A = ( D - L - U ) \) where \(-L \text{,}\) \(D \text{,}\) and \(-U \) are the strictly lower triangular, diagonal, and strictly upper triangular parts of \(A \text{.}\)

For the Jacobi iteration, \(M = D \) and \(N = ( L + U ) \text{.}\)
For the Gauss-Seidel iteration, \(M = (D - L ) \) and \(N = U \text{.}\)

In practice, \(M \) is not inverted. Instead, the iteration is implemented as

\begin{equation*} M x^{(k+1)} = N x^{(k)} + y, \end{equation*}

with which we emphasize that we solve with \(M \) rather than inverting it.

Homework 7.3.3.1.

Why are the choices of \(M \) and \(N \) used by the Jacobi iteration and Gauss-Seidel iteration convenient?

Solution

Both methods have two advantages:

The multiplication \(N u^{(k)} \) can exploit sparsity in the original matrix \(A \text{.}\)
Solving with \(M \) is relatively cheap. In the case of the Jacobi iteration (\(M = D \)) it is trivial. In the case of the Gauss-Seidel iteration (\(M = (D - L )\)), the lower triangular system inherits the sparsity pattern of the corresponding part of \(A \text{.}\)

Homework 7.3.3.2.

Let \(A = M - N \) be a splitting of matrix \(A \text{.}\) Let \(x^{(k+1)} = M^{-1} ( N x^{(k)} + y ) \text{.}\) Show that

\begin{equation*} x^{(k+1)} = x^{(k)} + M^{-1} r^{(k)}, \mbox{ where } r^{(k)} = y - A x^{(k)}. \end{equation*}

Solution

\begin{equation*} \begin{array}{l} x^{(k)} + M^{-1} r^{(k)} \\ ~~~~ = ~~~ \\ x^{(k)} + M^{-1} ( y - A x^{(k)}) \\ ~~~~ = ~~~ \\ x^{(k)} + M^{-1} ( y - (M-N) x^{(k)}) \\ ~~~~ = ~~~ \\ x^{(k)} + M^{-1} y - M^{-1} (M-N) x^{(k)} \\ ~~~~ = ~~~ \\ x^{(k)} + M^{-1} y - ( I - M^{-1} N ) x^{(k)} \\ ~~~~ = ~~~ \\ M^{-1}( N x^{(k)} + y ) \end{array} \end{equation*}

This last exercise provides an important link between iterative refinement, discussed in Subsection 5.3.7, and splitting methods. Let us revisit this, using the notation from this section.

If \(A x = y \) and \(x^{(k)} \) is a (current) approximation to \(x \text{,}\) then

\begin{equation*} r^{(k)} = y - A x^{(k)} \end{equation*}

is the (current) residual. If we solve

\begin{equation*} A \delta\!x^{(k)} = r^{(k)} \end{equation*}

or, equivalently, compute

\begin{equation*} \delta\!x = A^{-1} r^{(k)} \end{equation*}

then

\begin{equation*} x = x^{(k)} + \delta\!x \end{equation*}

is the solution to \(A x = y \text{.}\) Now, if we merely compute an approximation,

\begin{equation*} \delta\!x^{(k)} \approx A^{-1} r^{(k)}, \end{equation*}

then

\begin{equation*} x^{(k+1)} = x^{(k)} + \delta\!x^{(k)} \end{equation*}

is merely a (hopefully better) approximation to \(x \text{.}\) If \(M \approx A \) then

\begin{equation*} \delta\!x^{(k)} = M^{-1} r^{(k)} \approx A^{-1} r^{(k)}. \end{equation*}

So, the better \(M \) approximates \(A \text{,}\) the faster we can expect \(x^{(k)} \) to converge to \(x \text{.}\)

With this in mind, we notice that if \(A = D - L -U \text{,}\) where \(D \text{,}\) \(-L \text{,}\) and \(-U \) equals its diagonal, strictly lower triangular, and strictly upper triangular part, and we split \(A = M - N \text{,}\) then \(M = D - L \) is a better approximation to matrix \(A \) than is \(M = D \text{.}\)

Ponder This 7.3.3.3.

Given these insights, why might the symmetric Gauss-Seidel method discussed in Homework 7.3.2.4 have benefits over the regular Gauss-Seidel method?

Loosely speaking, a sequence of numbers, \(\chi^{(k)} \) is said to converge to the number \(\chi \) if \(\vert \chi^{(k)} - \chi \vert \) eventually becomes arbitrarily close to zero. This is written as

\begin{equation*} \lim_{k \rightarrow \infty} \chi^{(k)} = \chi. \end{equation*}

A sequence of vectors, \(x^{(k)} \text{,}\) converges to the vector \(x \) if for some norm \(\| \cdot \| \)

\begin{equation*} \lim_{k \rightarrow \infty} \| x^{(k)} - x \| = 0. \end{equation*}

Because of the equivalence of norms, if the sequence converges in one norm, it converges in all norms. In particular, it means it converges in the \(\infty\)-norm, which means that \(\max_{i} \vert \chi_{i}^{(k)} - \chi_{i} \vert \) converges to zero, and hence for all entries \(\vert \chi_{i}^{(k)} - \chi_{i}\vert \) eventually becomes arbitrarily small. Finally, a sequence of matrices, \(A^{(k)} \text{,}\) converges to the matrix \(A \) if for some norm \(\| \cdot \| \)

\begin{equation*} \lim_{k \rightarrow \infty} \| A^{(k)}- A \|. \end{equation*}

Again, if it converges for one norm, it converges for all norms and the individual elements of \(A^{(k)} \) converge to the corresponding elements of \(A \text{.}\)

Let's now look at the convergence of splitting methods. If \(x \) solves \(A x = y \) and \(x^{(k)} \) is the sequence of vectors generated starting with \(x^{(0)} \text{,}\) then

\begin{equation*} \begin{array}{rcl} M x \amp = \amp N x + y \\ M x^{(k+1)} \amp = \amp N x^{(k)} + y \end{array} \end{equation*}

so that

\begin{equation*} M( x^{(k+1)} - x ) = N( x^{(k)} - x ) \end{equation*}

or, equivalently,

\begin{equation*} x^{(k+1)} - x = ( M^{-1} N) ( x^{(k)} - x ) . \end{equation*}

This, in turn, means that

\begin{equation*} x^{(k+1)} - x = ( M^{-1} N)^{k+1} ( x^{(0)} - x ) . \end{equation*}

If \(\| \cdot \| \) is a vector norm and its induced matrix norm, then

\begin{equation*} \| x^{(k+1)} - x \| = \| ( M^{-1} N)^{k+1} ( x^{(0)} - x ) \| \leq \| M^{-1} N \|^{k+1} \| x^{(0)} - x \|. \end{equation*}

Hence, if \(\| M^{-1} N \| \lt 1 \) in that norm, then \(\lim_{i\rightarrow \infty } \| M^{-1} N \|^{i} = 0 \) and hence \(x^{(k)} \) converges to \(x \text{.}\) We summarize this in the following theorem:

Theorem 7.3.3.1.

Let \(A \in \Rnxn \) be nonsingular and \(x, y \in \Rn \) so that \(A x = y \text{.}\) Let \(A = M - N \) be a splitting of \(A\text{,}\) \(x^{(0)} \) be given (an initial guess), and \(x^{(k+1)} = M^{-1} ( N x^{(k)} + y ) \text{.}\) If \(\| M^{-1} N \| \lt 1 \) for some matrix norm induced by the \(\| \cdot \| \) vector norm, then \(x^{(k)} \) will converge to the solution \(x \text{.}\)

Because of the equivalence of matrix norms, if we can find any matrix norm \(\vert\vert\vert \cdot \vert\vert\vert \) such that \(\vert\vert\vert M^{-1} N \vert\vert\vert \lt 1 \text{,}\) the sequence of vectors converges.

Ponder This 7.3.3.4.

Contemplate the finer points of the last argument about the convergence of \((M^{-1} N)^{i} \)

Understanding the following observation will have to wait until after we cover eigenvalues and eigenvectors, later in the course. For splitting methods, it is the spectral radius of a matrix (the magnitude of the eigenvalue with largest magnitude), \(\rho( B ) \text{,}\) that often gives us insight into whether the method converges. This, once again, requires us to use a result from a future week in this course: It can be shown that for all \(B \in \Rmxm \) and \(\epsilon \gt 0 \) there exists a norm \(\vert \vert \vert \cdot \vert \vert \vert_{B,\epsilon} \) such that \(\vert \vert \vert B \vert \vert \vert_{B,\epsilon} \leq \rho( B ) + \epsilon \text{.}\) What this means is that if we can show that \(\rho( M^{-1} N ) \lt 1 \text{,}\) then the splitting method converges for the given matrix \(A \text{.}\)

Homework 7.3.3.5.

Given nonsingular \(A \in \Rnxn \text{,}\) what splitting \(A = M - N \) will give the fastest convergence to the solution of \(A x = y \text{?}\)

Solution

\(M = A \) and \(N = 0 \text{.}\) Then, regardless of the initial vector \(x^{(0)} \text{,}\)

\begin{equation*} x^{(1)} := M^{-1}( N x^{(0)} + y ) = A^{-1}( 0 x^{(0)} + y ) = A^{-1} y. \end{equation*}

Thus, convergence occurs after a single iteration.