In the video, the quadratic polynomial pictured takes on the value \(- \frac{1}{2} \widehat x^T A \widehat x \) at \(\widehat x \) and that minimum is below the x-axis. This does not change the conclusions that are drawn in the video.
The basic idea behind a descent method is that at the \(k\)th iteration one has an approximation to \(x \text{,}\) \(x^{(k)} \text{,}\) and one would like to create a better approximation, \(x^{(k+1)} \text{.}\) To do so, the method picks a search direction, \(p^{(k)} \text{,}\) and chooses the next approximation by taking a step from the current approximate solution in the direction of \(p^{(k)} \text{:}\)
In other words, one searches for a minimum along a line defined by the current iterate, \(x^{(k)} \text{,}\) and the search direction, \(p^{(k)} \text{.}\) One then picks \(\alpha_k\) so that, preferrably, \(f( x^{(k+1)} ) \leq f(
x^{(k)} ) \text{.}\) This is summarized in Figure 8.2.1.2. To this goal, typically, an exact descent method picks \(\alpha_k \) to exactly minimize the function along the line from the current approximate solution in the direction of \(p^{(k)} \text{.}\)
Now,
\begin{equation*}
\begin{array}{l}
f( x^{(k+1)} )\\
~~~~=~~~ \lt x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \gt \\
f( x^{(k)} + \alpha_k p^{(k)} )\\
~~~~=~~~ \lt \mbox{ evaluate } \gt \\
\frac{1}{2}
\left( x^{(k)} + \alpha_k p^{(k)} \right)^T A
\left( x^{(k)} + \alpha_k p^{(k)} \right) - \left( x^{(k)} + \alpha_k p^{(k)} \right)^T b \\
~~~~=~~~ \lt \mbox{ multiply out } \gt \\
\frac{1}{2}
x^{(k)\,T} A x^{(k)}
+ \alpha_k p^{(k)\,T} A x^{(k)}
+ \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)}
- x^{(k)\,T} b - \alpha_k p^{(k)\,T} b \\
~~~~=~~~\lt \mbox{ rearrange } \gt \\
\frac{1}{2}
x^{(k)\,T} A x^{(k)}
- x^{(k)\,T} b
+ \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)}
+ \alpha_k p^{(k)\,T} A x^{(k)} - \alpha_k p^{(k)\,T} b \\
~~~~=~~~ \lt \mbox{ substitute } f( x^{(k)} )
\mbox{ and factor out common terms } \gt \\
f( x^{(k)} )
+ \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)}
+ \alpha_k p^{(k)\,T} ( A x^{(k)} - b ) \\
~~~~=~~~ \lt \mbox{ substitute } r^{(k)}
\mbox{ and commute to expose polynomial in } \alpha_k \\
\frac{1}{2} p^{(k)\,T} A p^{(k)} \alpha_k^2
- p^{(k)\,T} r^{(k)} \alpha_k + f( x^{(k)} )
,
\end{array}
\end{equation*}
where \(r^{(k)} = b - A x^{(k)}\) is the residual. This is a quadratic polynomial in the scalar \(\alpha_k \) (since this is the only free variable).
provides the next approximation to the solution. This leaves us with the question of how to pick the search directions \(\{ p^{(0)}, p^{(1)}, \ldots \} \text{.}\)
A basic decent method based on these ideas is given in Figure 8.2.1.3.
Homework8.2.1.1.
The cost of an iterative method is a combination of how many iterations it takes to convergence and the cost per iteration. For the loop in Figure 8.2.1.3, count the number of matrix-vector multiplications, dot products, and "axpy" operations (not counting the cost of determining the next descent direction).