Subsection 9.3.2 The Power Method: Convergence
¶Before we make the algorithm practical, let us examine how fast the iteration converges. This requires a few definitions regarding rates of convergence.
Definition 9.3.2.1. Convergence of a sequence of scalars.
Let \(\alpha_0, \alpha_1 , \alpha_2, \ldots \in \R \) be an infinite sequence of scalars. Then \(\alpha_k \) is said to converge to \(\alpha \) if
Definition 9.3.2.2. Convergence of a sequence of vectors.
Let \(x_0, x_1 , x_2, \ldots \in \C^m \) be an infinite sequence of vectors. Then \(x_k \) is said to converge to \(x \) if for any norm \(\| \cdot \| \)
Because of the equivalence of norms, discussed in Subsection 1.2.6, if a sequence of vectors converges in one norm, then it converges in all norms.
Definition 9.3.2.3. Rate of convergence.
Let \(\alpha_0, \alpha_1 , \alpha_2, \ldots \in \R \) be an infinite sequence of scalars that converges to \(\alpha \text{.}\) Then
-
\(\alpha_k \) is said to converge linearly to \(\alpha \) if for sufficiently large \(k \)
\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert \end{equation*}for some constant \(C \lt 1 \text{.}\) In other words, if
\begin{equation*} \lim_{k \rightarrow \infty} \frac {\vert \alpha_{k+1} - \alpha \vert} {\vert \alpha_{k} - \alpha \vert} = C \lt 1. \end{equation*} -
\(\alpha_k \) is said to converge superlinearly to \(\alpha \) if for sufficiently large \(k \)
\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C_k \vert \alpha_{k} - \alpha \vert \end{equation*}with \(C_k \rightarrow 0 \text{.}\) In other words, if
\begin{equation*} \lim_{k \rightarrow \infty} \frac {\vert \alpha_{k+1} - \alpha \vert} {\vert \alpha_{k} - \alpha \vert} = 0. \end{equation*} -
\(\alpha_k \) is said to converge quadratically to \(\alpha \) if for sufficiently large \(k \)
\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert ^2 \end{equation*}for some constant \(C \text{.}\) In other words, if
\begin{equation*} \lim_{k \rightarrow \infty} \frac {\vert \alpha_{k+1} - \alpha \vert} {\vert \alpha_{k} - \alpha \vert^2} = C . \end{equation*} -
\(\alpha_k \) is said to converge cubically to \(\alpha \) if for large enough \(k \)
\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert ^3 \end{equation*}for some constant \(C \text{.}\) In other words, if
\begin{equation*} \lim_{k \rightarrow \infty} \frac {\vert \alpha_{k+1} - \alpha \vert} {\vert \alpha_{k} - \alpha \vert^3} = C . \end{equation*}
Linear convergence can be slow. Let's say that for \(k \geq K \) we observe that
Then, clearly, \(\vert \alpha_{k+n} - \alpha\vert \leq C^n \vert \alpha_{k} - \alpha\vert \text{.}\) If \(C = 0.99 \text{,}\) progress may be very, very slow. If \(\vert \alpha_k - \alpha \vert = 1 \text{,}\) then
Quadratic convergence is fast. Now
Even if \(C = 0.99 \) and \(\vert \alpha_k - \alpha \vert = 1 \text{,}\) then
If we consider \(\alpha \) the correct result then, eventually, the number of correct digits roughly doubles in each iteration. This can be explained as follows: If \(\vert \alpha_{k} - \alpha \vert \lt 1 \text{,}\) then the number of correct decimal digits is given by
Since \(\log_{10} \) is a monotonically increasing function,
and hence
Cubic convergence is dizzyingly fast: Eventually the number of correct digits triples from one iteration to the next.
For our analysis for the convergence of the Power Method, we define a convenient norm.
Homework 9.3.2.1.
Let \(X \in \C^{m \times m} \) be nonsingular. Define \(\| \cdot \|_{X^{-1}} : \C^m \rightarrow \R \) by \(\| y \|_{X^{-1}} = \| X^{-1} y \| \) for some given norm \(\| \cdot \|: \C^m \rightarrow \R \text{.}\)
ALWAYS/SOMETIMES/NEVER: \(\| \cdot \|_{X^{-1}} \) is a norm.
We need to show that
-
If \(y \neq 0 \) then \(\| y \|_{X^{-1}} \gt 0 \text{:}\)
Let \(y \neq 0 \) and \(z = X^{-1} y \text{.}\) Then \(z \neq 0 \) since \(X \) is nonsingular. Hence
\begin{equation*} \| y \|_{X^{-1}} = \| X^{-1} y \| = \| z \| \gt 0 . \end{equation*} -
If \(\alpha \in \C \) and \(y \in \Cm\) then \(\| \alpha y \|_{X^{-1}} = \vert \alpha \vert \| y \|_{X^{-1}} \text{:}\)
\begin{equation*} \| \alpha y \|_{X^{-1}} = \| {X^{-1}} ( \alpha y ) \| = \| \alpha {X^{-1}} y \| = \vert \alpha \vert \| {X^{-1}} y \| = \vert \alpha \vert \| y \|_{X^{-1}} . \end{equation*} -
If \(x, y \in \Cm \) then \(\| x + y \|_{X^{-1}} \leq \| x \|_{X^{-1}} + \| y \|_{X^{-1}} \text{:}\)
\begin{equation*} \begin{array}{l} \| x + y \|_{X^{-1}}\\ ~~~=~~~~\\ \| {X^{-1}} ( x + y ) \| \\ ~~~=~~~~\\ \| {X^{-1}} x + X^{-1} y \| \\ ~~~\leq~~~~\\ \| {X^{-1}} x \| + \| {X^{-1}} y \| \\ ~~~=~~~~\\ \| x \|_{X^{-1}} + \| y \|_{X^{-1}} . \end{array} \end{equation*}
What do we learn from this exercise? Recall that a vector \(z \) can alternatively be written as \(X ( X^{-1} z ) \) so that the vector \(\hat z = X^{-1} z \) tells you how to represent the vector \(z \) in the basis given by the columns of \(X \text{.}\) What the exercise tells us is that if we measure a vector by applying a known norm in a new basis, then that is also a norm.
With this insight, we can perform our convergence analysis:
Hence
and
Now, let \(\| \cdot \| \) be a p-norm and \(\| \cdot \|_{X^{-1}} \) as defined in Homework 9.3.2.1. Then
This shows that, in this norm, the convergence of \(v^{(k)} \) to \(\psi_0 x_0 \) is linear: The difference between current approximation, \(v^{(k)} \text{,}\) and the eventual vector in the direction of the desired eigenvector, \(\psi x_0 \text{,}\) is reduced by at least a constant factor in each iteration.
Now, what if
By extending the above analysis one can easily show that \(v^{(k)} \) will converge to a vector in the subspace spanned by the eigenvectors associated with \(\lambda_0, \ldots , \lambda_{n-1} \text{.}\)
An important special case is when \(n = 2\text{:}\) if \(A \) is real-valued then \(\lambda_0 \) may be complex-valued in which case its conjugate, \(\bar \lambda_0 \text{,}\) is also an eigenvalue and hence has the same magnitude as \(\lambda_0 \text{.}\) We deduce that \(v^{(k)} \) will always be in the space spanned by the eigenvectors corresponding to \(\lambda_0 \) and \(\bar \lambda_0 \text{.}\)