Subsection 7.3.3 Convergence of splitting methods
¶The Jacobi and Gauss-Seidel iterations can be generalized as follows. Split matrix \(A = M - N \) where \(M \) is nonsingular. Now,
is equivalent to
and
This is an example of a fixed-point equation: Plug \(x \) into \(M^{-1} ( N x + y ) \) and the result is again \(x \text{.}\) The iteration is then created by viewing the vector on the left as the next approximation to the solution given the current approximation \(x \) on the right:
Let \(A = ( D - L - U ) \) where \(-L \text{,}\) \(D \text{,}\) and \(-U \) are the strictly lower triangular, diagonal, and strictly upper triangular parts of \(A \text{.}\)
For the Jacobi iteration, \(M = D \) and \(N = ( L + U ) \text{.}\)
For the Gauss-Seidel iteration, \(M = (D - L ) \) and \(N = U \text{.}\)
In practice, \(M \) is not inverted. Instead, the iteration is implemented as
with which we emphasize that we solve with \(M \) rather than inverting it.
Homework 7.3.3.1.
Why are the choices of \(M \) and \(N \) used by the Jacobi iteration and Gauss-Seidel iteration convenient?
Both methods have two advantages:
The multiplication \(N u^{(k)} \) can exploit sparsity in the original matrix \(A \text{.}\)
Solving with \(M \) is relatively cheap. In the case of the Jacobi iteration (\(M = D \)) it is trivial. In the case of the Gauss-Seidel iteration (\(M = (D - L )\)), the lower triangular system inherits the sparsity pattern of the corresponding part of \(A \text{.}\)
Homework 7.3.3.2.
Let \(A = M - N \) be a splitting of matrix \(A \text{.}\) Let \(x^{(k+1)} = M^{-1} ( N x^{(k)} + y ) \text{.}\) Show that
This last exercise provides an important link between iterative refinement, discussed in Subsection 5.3.7, and splitting methods. Let us revisit this, using the notation from this section.
If \(A x = y \) and \(x^{(k)} \) is a (current) approximation to \(x \text{,}\) then
is the (current) residual. If we solve
or, equivalently, compute
then
is the solution to \(A x = y \text{.}\) Now, if we merely compute an approximation,
then
is merely a (hopefully better) approximation to \(x \text{.}\) If \(M \approx A \) then
So, the better \(M \) approximates \(A \text{,}\) the faster we can expect \(x^{(k)} \) to converge to \(x \text{.}\)
With this in mind, we notice that if \(A = D - L -U \text{,}\) where \(D \text{,}\) \(-L \text{,}\) and \(-U \) equals its diagonal, strictly lower triangular, and strictly upper triangular part, and we split \(A = M - N \text{,}\) then \(M = D - L \) is a better approximation to matrix \(A \) than is \(M = D \text{.}\)
Ponder This 7.3.3.3.
Given these insights, why might the symmetric Gauss-Seidel method discussed in Homework 7.3.2.4 have benefits over the regular Gauss-Seidel method?
Loosely speaking, a sequence of numbers, \(\chi^{(k)} \) is said to converge to the number \(\chi \) if \(\vert \chi^{(k)} - \chi \vert \) eventually becomes arbitrarily close to zero. This is written as
A sequence of vectors, \(x^{(k)} \text{,}\) converges to the vector \(x \) if for some norm \(\| \cdot \| \)
Because of the equivalence of norms, if the sequence converges in one norm, it converges in all norms. In particular, it means it converges in the \(\infty\)-norm, which means that \(\max_{i} \vert \chi_{i}^{(k)} - \chi_{i} \vert \) converges to zero, and hence for all entries \(\vert \chi_{i}^{(k)} - \chi_{i}\vert \) eventually becomes arbitrarily small. Finally, a sequence of matrices, \(A^{(k)} \text{,}\) converges to the matrix \(A \) if for some norm \(\| \cdot \| \)
Again, if it converges for one norm, it converges for all norms and the individual elements of \(A^{(k)} \) converge to the corresponding elements of \(A \text{.}\)
Let's now look at the convergence of splitting methods. If \(x \) solves \(A x = y \) and \(x^{(k)} \) is the sequence of vectors generated starting with \(x^{(0)} \text{,}\) then
so that
or, equivalently,
This, in turn, means that
If \(\| \cdot \| \) is a vector norm and its induced matrix norm, then
Hence, if \(\| M^{-1} N \| \lt 1 \) in that norm, then \(\lim_{i\rightarrow \infty } \| M^{-1} N \|^{i} = 0 \) and hence \(x^{(k)} \) converges to \(x \text{.}\) We summarize this in the following theorem:
Theorem 7.3.3.1.
Let \(A \in \Rnxn \) be nonsingular and \(x, y \in \Rn \) so that \(A x = y \text{.}\) Let \(A = M - N \) be a splitting of \(A\text{,}\) \(x^{(0)} \) be given (an initial guess), and \(x^{(k+1)} = M^{-1} ( N x^{(k)} + y ) \text{.}\) If \(\| M^{-1} N \| \lt 1 \) for some matrix norm induced by the \(\| \cdot \| \) vector norm, then \(x^{(k)} \) will converge to the solution \(x \text{.}\)
Because of the equivalence of matrix norms, if we can find any matrix norm \(\vert\vert\vert \cdot \vert\vert\vert \) such that \(\vert\vert\vert M^{-1} N \vert\vert\vert \lt 1 \text{,}\) the sequence of vectors converges.
Ponder This 7.3.3.4.
Contemplate the finer points of the last argument about the convergence of \((M^{-1} N)^{i} \)
Understanding the following observation will have to wait until after we cover eigenvalues and eigenvectors, later in the course. For splitting methods, it is the spectral radius of a matrix (the magnitude of the eigenvalue with largest magnitude), \(\rho( B ) \text{,}\) that often gives us insight into whether the method converges. This, once again, requires us to use a result from a future week in this course: It can be shown that for all \(B \in \Rmxm \) and \(\epsilon \gt 0 \) there exists a norm \(\vert \vert \vert \cdot \vert \vert \vert_{B,\epsilon} \) such that \(\vert \vert \vert B \vert \vert \vert_{B,\epsilon} \leq \rho( B ) + \epsilon \text{.}\) What this means is that if we can show that \(\rho( M^{-1} N ) \lt 1 \text{,}\) then the splitting method converges for the given matrix \(A \text{.}\)
Homework 7.3.3.5.
Given nonsingular \(A \in \Rnxn \text{,}\) what splitting \(A = M - N \) will give the fastest convergence to the solution of \(A x = y \text{?}\)
\(M = A \) and \(N = 0 \text{.}\) Then, regardless of the initial vector \(x^{(0)} \text{,}\)
Thus, convergence occurs after a single iteration.