In the video, the quadratic polynomial pictured takes on the value \(- \widehat x A \widehat x \) at \(\widehat x \) and that minimum is below the x-axis. This does not change the conclusions that are drawn in the video.
The basic idea behind a descent method is that at the \(k\)th iteration one has an approximation to \(x \text{,}\) \(x^{(k)} \text{,}\) and one would like to create a better approximation, \(x^{(k+1)} \text{.}\) To do so, the method picks a search direction, \(p^{(k)} \text{,}\) and chooses the next approximation by taking a step from the current approximate solution in the direction of \(p^{(k)} \text{:}\)
In other words, one searches for a minimum along a line defined by the current iterate, \(x^{(k)} \text{,}\) and the search direction, \(p^{(k)} \text{.}\) One then picks \(\alpha_k\) so that, preferrably, \(f( x^{(k+1)} ) \leq f(
x^{(k)} ) \text{.}\) This is summarized in Figure 8.2.1.2.
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b, x^{(0)}
\\
r^{(0)} := b - A x^{(0)} \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ p^{(k)} := \mbox{ next direction } \\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\mbox{ for some scalar } \alpha_k
\\
~~~ r^{(k+1)} := b - A x^{(k+1)}
\\
~~~ k := k+1 \\
{\bf endwhile}
\end{array}
\end{equation*}
Figure8.2.1.2. Outline for a descent method. To this goal, typically, an exact descent method picks \(\alpha_k \) to exactly minimize the function along the line from the current approximate solution in the direction of \(p^{(k)} \text{.}\)
Now,
\begin{equation*}
\begin{array}{l}
f( x^{(k+1)} )\\
~~~~=~~~ \lt x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \gt \\
f( x^{(k)} + \alpha_k p^{(k)} )\\
~~~~=~~~ \lt \mbox{ evaluate } \gt \\
\frac{1}{2}
\left( x^{(k)} + \alpha_k p^{(k)} \right)^T A
\left( x^{(k)} + \alpha_k p^{(k)} \right) - \left( x^{(k)} + \alpha_k p^{(k)} \right)^T b \\
~~~~=~~~ \lt \mbox{ multiply out } \gt \\
\frac{1}{2}
x^{(k)\,T} A x^{(k)}
+ \alpha_k p^{(k)\,T} A x^{(k)}
+ \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)}
- x^{(k)\,T} b - \alpha_k p^{(k)\,T} b \\
~~~~=~~~\lt \mbox{ rearrange } \gt \\
\frac{1}{2}
x^{(k)\,T} A x^{(k)}
- x^{(k)\,T} b
+ \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)}
+ \alpha_k p^{(k)\,T} A x^{(k)} - \alpha_k p^{(k)\,T} b \\
~~~~=~~~ \lt \mbox{ substitute } f( x^{(k)} )
\mbox{ and factor out common terms } \gt \\
f( x^{(k)} )
+ \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)}
+ \alpha_k p^{(k)\,T} ( A x^{(k)} - b ) \\
~~~~=~~~ \lt \mbox{ substitute } r^{(k)}
\mbox{ and commute to expose polynomial in } \alpha_k \\
\frac{1}{2} p^{(k)\,T} A p^{(k)} \alpha_k^2
- p^{(k)\,T} r^{(k)} \alpha_k + f( x^{(k)} )
,
\end{array}
\end{equation*}
where \(r^{(k)} = b - A x^{(k)}\) is the residual. This is a quadratic polynomial in the scalar \(\alpha_k \) (since this is the only free variable).
provides the next approximation to the solution. This leaves us with the question of how to pick the search directions \(\{ p^{(0)}, p^{(1)}, \ldots \} \text{.}\)
A basic decent method based on these ideas is given in Figure 8.2.1.3.
\begin{equation*}
\begin{array}{l}
{\bf Given:~} A, b, x^{(0)}
\\
r^{(0)} := b - A x^{(0)} \\
k := 0 \\
{\bf while~} r^{(k)} \neq 0 \\
~~~ p^{(k)} := \mbox{ next direction } \\
~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}}
\\
~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}
\\
~~~ r^{(k+1)} := b - A x^{(k+1)}
\\
~~~k := k+1 \\
{\bf endwhile}
\end{array}
\end{equation*}
Figure8.2.1.3. Basic descent method.
Homework8.2.1.1.
The cost of an iterative method is a combination of how many iterations it takes to convergence and the cost per iteration. For the loop in Figure 8.2.1.3, count the number of matrix-vector multiplications, dot products, and "axpy" operations (not counting the cost of determining the next descent direction).