Let's start our generic descent method algorithm with \(x^{(0)} = 0 \text{.}\) Here we do not use the temporary vector \(q^{(k)} = A p^{(k)} \) so that later we can emphasize how to cast the Conjugate Gradient Method in terms of as few matrix-vector multiplication as possible (one to be exact).
and the search directions were linearly independent. Then, the resulting descent method, in exact arithmetic, is guaranteed to complete in at most \(n \) iterations, This is because then
Unfortunately, the Method of Steepest Descent does not have this property. The next approximation to the solution, \(x^{(k+1)} \) minimizes \(f( x ) \) where \(x \) is constrained to be on the line \(x^{(k)} + \alpha p^{(k)} \text{.}\) Because in each step \(f( x^{(k+1)} ) \leq f( x^{(k)} )
\text{,}\) a slightly stronger result holds: It also minimizes \(f( x ) \) where \(x \) is constrained to be on the union of lines \(x^{(j)} + \alpha p^{(j)} \text{,}\) \(j = 0, \ldots, k \text{.}\) However, unless we pick the search directions very carefully, that is not the same as it minimizing over all vectors in \(\Span( p^{(0)},
\ldots , p^{(k)} ) \text{.}\)
Let \(p^{(k)} \) be a new search direction that is linearly independent of the columns of \(P^{(k-1)} \text{,}\) which themselves are linearly independent. Show that
\begin{equation*}
\begin{array}{l}
\min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)}, p^{(k)})} f( x ) =
\min_y f(
P^{(k)} y ) \\
\\
~~~~~ =
\min_{y
}
\left[
\frac{1}{2}
y_0^T P^{(k-1)\,T}
A
P^{(k-1)} y_0
-
y_0^T P^{(k-1)\,T} b \right.
\\
~~~~~~~~~~~~~
\left.
+
\psi_1
y_0^T P^{(k-1)\,^T}
A
p^{(k)}
+
\frac{1}{2}
\psi_1^2 p^{(k)\,T}
A
p^{(k)}
- \psi_1 p^{(k)\,T} b \right],
\end{array}
\end{equation*}
A sequence of such directions is said to be A-conjugate.
Definition8.3.1.2. A-conjugate directions.
Let \(A \) be SPD. A sequence \(p^{(0)}, \ldots, p^{(k-1)} \in \Rn \) such that \(p^{(j)\,T} A p^{(i)} = 0\) if and only if \(j \neq i \) is said to be A-conjugate.
Homework8.3.1.2.
Let \(A \in \R^{n \times n}\) be SPD.
ALWAYS/SOMETIMES/NEVER: The the columns of \(P \in \R^{n \times k}\) are A-conjugate if and only if \(P^T A P = D \) where \(D \) is diagonal and has positive values on its diagonal.
We employ a proof by contradiction. Suppose the columns of \(P \) are not linearly independent. Then there exists \(y \neq 0 \) such that \(P y = 0 \text{.}\) Let \(D = P^T A P \text{.}\) From the last homework we know that \(D \) is diagonal and has positive diagonal elements. But then
\begin{equation*}
\begin{array}{l}
0 \\
~~~=~~~~ \lt P y = 0 \gt \\
( P y )^T A ( P y ) \\
~~~=~~~~ \lt \mbox{ multiply out } \gt \\
y^T P^T A P y \\
~~~=~~~~ \lt P^T A P = D \gt \\
y^T D y \\
~~~\gt ~~~~ \lt D \mbox{ is SPD} \gt \\
0,
\end{array}
\end{equation*}
which is a contradiction. Hence, the columns of \(P\) are linearly independent.
The above observations leaves us with a descent method that picks the search directions to be A-conjugate, given in Figure 8.3.1.3.
Remark8.3.1.4.
The important observation is that if \(p^{(0)},
\ldots , p^{(k)} \) are chosen to be A-conjugate, then \(x^{(k+1)} \) minimizes not only