Subsection 8.5.2 Summary
ΒΆGiven a function \(f: \mathbb R^n \rightarrow \mathbb R
\text{,}\) its gradient is given by
\begin{equation*}
\nabla f( x ) =
\left( \begin{array}{c}
\frac{\partial f}{\partial \chi_0}( x ) \\
\frac{\partial f}{\partial \chi_1}( x ) \\
\vdots \\
\frac{\partial f}{\partial \chi_{n-1}}( x )
\end{array}
\right).
\end{equation*}
\(\nabla f( x ) \) equals the direction in which the function \(f \) increases most rapidly at the point \(x \) and \(- \nabla f( x ) \) equals the direction of steepest descent (the direction in which the function \(f \) decreases most rapidly at the point \(x \)).
In this summary, we will assume that \(A \in \mathbb R^{n \times n} \) is symmetric positive definite (SPD) and
\begin{equation*}
f( x ) = \frac{1}{2} x^T A x - x^T b.
\end{equation*}
The gradient of this function equals
\begin{equation*}
\nabla f( x ) = A x - b
\end{equation*}
and \(\widehat x \) minimizes the function if and only if
\begin{equation*}
A \widehat x = b .
\end{equation*}
If \(x^{(k)} \) is an approximation to \(\widehat x\) then \(r^{(k)} = b - A x^{(k)} \) equals the corresponding residual. Notice that \(r^{(k)} = -\nabla f( x^{(k)}) \) and hence \(r^{(k)} \) is the direction of steepest descent .
A prototypical descent method is given by
\begin{equation*}
\begin{array}{l}
{\bf Given:} A, b, x^{(0)}
\\
r^{(0)} := b - A x^{(0)} \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ p^{(k)} := \mbox{ next direction } \\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\mbox{ for some scalar } \alpha_k
\\
~~~ r^{(k+1)} := b - A x^{(k+1)}
\\
~~~ k := k+1 \\
{\bf endwhile}
\end{array}
\end{equation*}
Here \(p^{(k)} \) is the "current" search direction and in each iteration we create the next approximation to \(\widehat x \text{,}\) \(x^{(k+1)} \text{,}\) along the line \(x^{(k)} + \alpha p^{(k)} \text{.}\)
If \(x^{(k+1)} \) minimizes along that line, the method is an exact descent method and
\begin{equation*}
\alpha_k = \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}}
\end{equation*}
so that a prototypical exact descent method is given by
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b, x^{(0)}
\\
r^{(0)} := b - A x^{(0)} \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ p^{(k)} := \mbox{ next direction } \\
~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}}
\\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\\
~~~ r^{(k+1)} := b - A x^{(k+1)}
\\
~~~k := k+1 \\
{\bf endwhile}
\end{array}
\end{equation*}
Once \(\alpha_k \) is determined,
\begin{equation*}
r^{(k+1)} = r^{(k)} - \alpha_k A p^{(k)}.
\end{equation*}
which saves a matrix-vector multiplication when incorporated into the prototypical exact descent method:
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b, x^{(0)}
\\
r^{(0)} := b - A x^{(0)} \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ p^{(k)} := \mbox{ next direction } \\
~~~ q^{(k)} := A p^{(k)} \\
~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} q^{(k)}}
\\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\\
~~~ r^{(k+1)} := r^{(k)} - \alpha_k q^{(k)}
\\
~~~k := k+1 \\
{\bf endwhile}
\end{array}
\end{equation*}
The steepest descent algorithm chooses \(p^{(k)} = -
\nabla f( x^{(k)} ) = b - A x^{(k)} = r^{(k)} \text{:}\)
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b, x^{(0)}
\\
r^{(0)} := b - A x^{(0)} \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ p^{(k)} := r^{(k)} \\
~~~ q^{(k)} := A p^{(k)} \\
~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} q^{(k)}}
\\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\\
~~~ r^{(k+1)} := r^{(k)} - \alpha_k q^{(k)}
\\
~~~k := k+1 \\
{\bf endwhile}
\end{array}
\end{equation*}
Convergence can be greatly accelerated by incorporating a preconditioner, \(M \text{,}\) where, ideally, \(M \approx
A \) is SPD and solving \(M z = y \) is easy (cheap).
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b, x^{(0)}, M
\\
r^{(0)} := b - A x^{(0)} \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ p^{(k)} := M^{-1} r^{(k)} \\
~~~ q^{(k)} := A p^{(k)} \\
~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} q^{(k)}}
\\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\\
~~~ r^{(k+1)} := r^{(k)} - \alpha_k q^{(k)}
\\
~~~k := k+1 \\
{\bf endwhile}
\end{array}
\end{equation*}
Definition 8.5.2.1. A-conjugate directions.
Let \(A \) be SPD. A sequence \(p^{(0)}, \ldots, p^{(k-1)} \in \Rn \) such that \(p^{(j)\,T} A p^{(i)} = 0\) if and only if \(j \neq i \) is said to be A-conjugate.
The columns of \(P \in \R^{n \times k}\) are A-conjugate if and only if \(P^T A P = D \) where \(D \) is diagonal and has positive values on its diagonal.
A-conjugate vectors are linearly independent.
A descent method that chooses the search directions to be A-conjugate will find the solution of \(A x = b \text{,}\) where \(A \in \mathbb R^{n \times n} \) is SPD, in at most \(n \) iterations:
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b \\
x^{(0)} := 0
\\
r^{(0)} = b \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ \mbox{Choose } p^{(k)}
\mbox{ such that } p^{(k)\,T} A P^{(k-1)} = 0
\mbox{ and } p^{(k)\,T} r^{(k)} \neq 0
\\
% ~~~ A p^{(k)} := A p^{(k)} \\
~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T}
A p^{(k)}}
\\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\\
~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\
~~~ k:= k+1 \\
{\bf endwhile}
\end{array}
\end{equation*}
The Conjugate Gradient Method chooses the search direction to equal the vector \(p^{(k)} \) that is A-conjugate to all previous search directions and is closest to the direction of steepest descent:
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b \\
x^{(0)} := 0
\\
r^{(0)} := b \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ {\bf if~} k = 0 \\
~~~ ~~~ p^{(k)} = r^{(0)} \\
~~~ {\bf else} \\
~~~ ~~~ p^{(k)} \mbox{ minimizes }
\min_{p \perp \Span( Ap^{(0)}, \ldots, Ap^{(k-1)}
)} \| r^{(k)} - p \|_2
\\
~~~ {\bf endif} \\
~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T}
A p^{(k)}}
\\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\\
~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\
{\bf endwhile}
\end{array}
\end{equation*}
The various vectors that appear in the Conjugate Gradient Method have the following properties: If \(P^{(p-1)} = \left( \begin{array}{c c c}
p^{(0)} \amp \cdots \amp p^{(k-1)}
\end{array} \right) \) then
\(P^{(k-1)\,T} r^{(k)} = 0. \)
- \(\displaystyle \Span( p^{(0)}, \ldots, p^{(k-1)} ) =
\Span( r^{(0)}, \ldots, r^{(k-1)} )
= \Span( b, A b, \ldots, A^{k-1} b ). \)
The residual vectors \(r^{(k)} \) are mutually orthogonal.
-
For \(k \geq 1 \)
\begin{equation*}
p^{(k)} = r^{(k)} - \gamma_{k} p^{(k-1)}
\end{equation*}
Definition 8.5.2.2. Krylov subspace.
The subspace
\begin{equation*}
{\cal K}_k( A, b ) = \Span( b, A b, \ldots,
A^{k-1} b )
\end{equation*}
is known as the order-k Krylov subspace.
Alternative Conjugate Gradient Methods are given by
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b \\
x^{(0)} := 0
\\
r^{(0)} := b \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ {\bf if~} k = 0 \\
~~~ ~~~ p^{(k)} = r^{(0)} \\
~~~ {\bf else} \\
~~~ ~~~ \gamma_k := -(p^{(k-1)\,T} A r^{(k)} ) /(
p^{(k-1)\,T} A
p^{(k-1)} )\\
~~~ ~~~ p^{(k)} := r^{(k)} + \gamma_k p^{(k-1)}
\\
~~~ {\bf endif} \\
~~~ \alpha_k := \frac{r^{(k)\,T} r^{(k)}}{p^{(k)\,T}
A p^{(k)}}
\\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\\
~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\
~~~ k := k + 1 \\
{\bf endwhile}
\end{array}
\end{equation*}
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b \\
x^{(0)} := 0
\\
r^{(0)} := b \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ {\bf if~} k = 0 \\
~~~ ~~~ p^{(k)} = r^{(0)} \\
~~~ {\bf else} \\
~~~ ~~~ \gamma_k := (r^{(k)\,T} r^{(k)}) /
(r^{(k-1)\,T} r^{(k-1)}) \\
~~~ ~~~ p^{(k)} := r^{(k)} + \gamma_k p^{(k-1)}
\\
~~~ {\bf endif} \\
~~~ \alpha_k := \frac{r^{(k)\,T} r^{(k)}}{p^{(k)\,T}
A p^{(k)}}
\\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\\
~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\
~~~ k := k + 1 \\
{\bf endwhile}
\end{array}
\end{equation*}
A practical stopping criteria for the Conjugate Gradient Method is to proceed while \(\| r^{(k)} \|_2
\leq \meps \| b \|_2 \) and some maximum number of iterations is not yet performed.
The Conjugate Gradient Method can be accelerated by incorporating a preconditioned, \(M \text{,}\) where \(M \approx A \) is SPD.