In the video, the quadratic polynomial pictured takes on the value −12ˆxTAˆx at ˆx and that minimum is below the x-axis. This does not change the conclusions that are drawn in the video.
The basic idea behind a descent method is that at the kth iteration one has an approximation to x,x(k), and one would like to create a better approximation, x(k+1). To do so, the method picks a search direction, p(k), and chooses the next approximation by taking a step from the current approximate solution in the direction of p(k):
In other words, one searches for a minimum along a line defined by the current iterate, x(k), and the search direction, p(k). One then picks αk so that, preferrably, f(x(k+1))≤f(x(k)). This is summarized in Figure 8.2.1.2.
Given:A,b,x(0)r(0):=b−Ax(0)k:=0whiler(k)≠0p(k):= next direction x(k+1):=x(k)+αkp(k) for some scalar αkr(k+1):=b−Ax(k+1)k:=k+1endwhile
Figure8.2.1.2. Outline for a descent method. To this goal, typically, an exact descent method picks αk to exactly minimize the function along the line from the current approximate solution in the direction of p(k).
f(x(k+1))=<x(k+1)=x(k)+αkp(k)>f(x(k)+αkp(k))=< evaluate >12(x(k)+αkp(k))TA(x(k)+αkp(k))−(x(k)+αkp(k))Tb=< multiply out >12x(k)TAx(k)+αkp(k)TAx(k)+12α2kp(k)TAp(k)−x(k)Tb−αkp(k)Tb=< rearrange >12x(k)TAx(k)−x(k)Tb+12α2kp(k)TAp(k)+αkp(k)TAx(k)−αkp(k)Tb=< substitute f(x(k)) and factor out common terms >f(x(k))+12α2kp(k)TAp(k)+αkp(k)T(Ax(k)−b)=< substitute r(k) and commute to expose polynomial in αk12p(k)TAp(k)α2k−p(k)Tr(k)αk+f(x(k)),
The cost of an iterative method is a combination of how many iterations it takes to convergence and the cost per iteration. For the loop in Figure 8.2.1.3, count the number of matrix-vector multiplications, dot products, and "axpy" operations (not counting the cost of determining the next descent direction).