Skip to main content

Subsection 8.2.1 Basics of descent methods

fit width
Remark 8.2.1.1.

In the video, the quadratic polynomial pictured takes on the value 12x^TAx^ at x^ and that minimum is below the x-axis. This does not change the conclusions that are drawn in the video.

The basic idea behind a descent method is that at the kth iteration one has an approximation to x, x(k), and one would like to create a better approximation, x(k+1). To do so, the method picks a search direction, p(k), and chooses the next approximation by taking a step from the current approximate solution in the direction of p(k):

x(k+1):=x(k)+αkp(k).

In other words, one searches for a minimum along a line defined by the current iterate, x(k), and the search direction, p(k). One then picks αk so that, preferrably, f(x(k+1))f(x(k)). This is summarized in Figure 8.2.1.2.

Given: A,b,x(0)r(0):=bAx(0)k:=0while r(k)0   p(k):= next direction    x(k+1):=x(k)+αkp(k) for some scalar αk   r(k+1):=bAx(k+1)   k:=k+1endwhile
Figure 8.2.1.2. Outline for a descent method.
To this goal, typically, an exact descent method picks αk to exactly minimize the function along the line from the current approximate solution in the direction of p(k).

fit width

Now,

f(x(k+1))    =   <x(k+1)=x(k)+αkp(k)>f(x(k)+αkp(k))    =   < evaluate >12(x(k)+αkp(k))TA(x(k)+αkp(k))(x(k)+αkp(k))Tb    =   < multiply out >12x(k)TAx(k)+αkp(k)TAx(k)+12αk2p(k)TAp(k)x(k)Tbαkp(k)Tb    =   < rearrange >12x(k)TAx(k)x(k)Tb+12αk2p(k)TAp(k)+αkp(k)TAx(k)αkp(k)Tb    =   < substitute f(x(k)) and factor out common terms >f(x(k))+12αk2p(k)TAp(k)+αkp(k)T(Ax(k)b)    =   < substitute r(k) and commute to expose polynomial in αk12p(k)TAp(k)αk2p(k)Tr(k)αk+f(x(k)),

where r(k)=bAx(k) is the residual. This is a quadratic polynomial in the scalar αk (since this is the only free variable).

fit width

Minimizing

f(x(k+1))=12p(k)TAp(k)αk2p(k)Tr(k)αk+f(x(k))

exactly requires the deriviative with respect to αk to be zero:

0=df(x(k)+αkp(k))dαk=p(k)TAp(k)αkp(k)Tr(k).

Hence, for a given choice of pk

αk=p(k)Tr(k)p(k)TAp(k)andx(k+1)=x(k)+αkp(k).

provides the next approximation to the solution. This leaves us with the question of how to pick the search directions {p(0),p(1),}.

A basic decent method based on these ideas is given in Figure 8.2.1.3.

Given: A,b,x(0)r(0):=bAx(0)k:=0while r(k)0   p(k):= next direction    αk:=p(k)Tr(k)p(k)TAp(k)   x(k+1):=x(k)+αkp(k)   r(k+1):=bAx(k+1)   k:=k+1endwhile
Figure 8.2.1.3. Basic descent method.

Homework 8.2.1.1.

The cost of an iterative method is a combination of how many iterations it takes to convergence and the cost per iteration. For the loop in Figure 8.2.1.3, count the number of matrix-vector multiplications, dot products, and "axpy" operations (not counting the cost of determining the next descent direction).

Solution
\begin{equation*} \begin{array}{l l} \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \amp 1 \mbox{ mvmult}, 2 \mbox{ dot products} \\ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \amp 1 \mbox{ axpy} \\ r^{(k+1)} := b - A x^{(k+1)} \amp 1 \mbox{ mvmult} \end{array} \end{equation*}

Total: 2 matrix-vector multiplies (mvmults), 2 dot products, 1 axpy.