The Power Method is a simple method that under mild conditions yields a vector corresponding to the eigenvalue that is largest in magnitude.
Throughout this section we will assume that a given matrix \(A \in \C^{m \times m} \) is diagonalizable. Thus, there exists a nonsingular matrix \(X \) and diagonal matrix \(\Lambda\) such that \(A = X \Lambda X^{-1} \text{.}\) From the last section, we know that the columns of \(X \) equal eigenvectors of \(A \) and the elements on the diagonal of \(\Lambda \) equal the eigenvalues:
This means that \(v^{(k)} \) will start pointing in the direction of \(x_0 \text{.}\) In other words, it will start pointing in the direction of an eigenvector corresponding to \(\lambda_0 \text{.}\) The problem is that it will become infinitely long if \(\vert \lambda_0 \vert \gt 1 \) or infinitesimally short if \(\vert \lambda_0 \vert \lt 1 \text{.}\) All is good if \(\vert \lambda_0 \vert = 1 \text{.}\)
An alternative way of looking at this is to exploit the fact that the eigenvectors, \(x_i \text{,}\) equal the columns of \(X \text{.}\) Then
\begin{equation*}
\begin{array}{rcl}
v^{(0)} = A^0 v^{(0)} \amp = \amp X y \\
v^{(1)} = A v^{(0)} \amp=\amp A X y = X \Lambda y \\
v^{(2)} = A v^{(1)} \amp=\amp
A X \Lambda y = X \Lambda^2 y \\
\amp \vdots \amp \\
v^{(k)} = A v^{(k-1)} \amp=\amp
A X \Lambda^{k-1} y = X \Lambda^{k} y \\
\end{array}
\end{equation*}
Clearly \(\lim_{k \rightarrow \infty} v^{(k)} = \psi_0 x_0 \text{,}\) as long as \(\psi_0 \neq 0 \text{,}\) since \(\left\vert \frac{ \lambda_k }{ \lambda_0 } \right\vert \lt 1 \) for \(k \gt 0 \text{.}\)
Another way of stating this is to notice that
\begin{equation*}
A^k =
\begin{array}[t]{c}
\underbrace{( A A \cdots A )}
\\
\mbox{ times}
\end{array}
=
\begin{array}[t]{c}
\underbrace{( X \Lambda X^{-1}) ( X \Lambda X^{-1}) \cdots ( X \Lambda X^{-1} )}
\\
\Lambda^k
\end{array}
= X \Lambda^k X^{-1}.
\end{equation*}
so that
\begin{equation*}
\begin{array}{rcl}
v^{(k)} \amp=\amp A^k v^{(0)} / \lambda_0^k \\
\amp=\amp A^k X y / \lambda_0^k
\\
\amp=\amp X \Lambda^k X^{-1} X y / \lambda_0^k
\\
\amp=\amp X \Lambda^k y / \lambda_0^k
\\
\amp=\amp X \left( \Lambda^k / \lambda_0^k \right) y
=
X \left( \begin{array}{ c | c | c | c }
1 \amp 0 \amp \cdots \amp 0 \\ \hline
0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline
\vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline
0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k
\end{array} \right) y.
\end{array}
\end{equation*}
Now, since \(\left\vert \frac{\lambda_{k}}{\lambda_0} \right\vert \lt 1 \) for \(k \gt 1 \) we can argue that
\begin{equation*}
\begin{array}{l}
\lim_{k \rightarrow \infty} v^{(k)} \\
~~~= ~~~~\\
\lim_{k \rightarrow \infty}
X
\left( \begin{array}{ c | c | c | c }
1 \amp 0 \amp \cdots \amp 0 \\ \hline
0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline
\vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline
0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k
\end{array} \right) y \\
~~~= ~~~~\\
X \left( \begin{array}{ c | c | c | c }
1 \amp 0 \amp \cdots \amp 0 \\ \hline
0 \amp 0 \amp \cdots \amp 0 \\ \hline
\vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline
0 \amp 0 \amp \cdots \amp 0
\end{array} \right) y \\
~~~=~~~~ \\
X \psi_0 e_0 \\
~~~= ~~~~\\
\psi_0 X e_0 = \psi_0 x_0.
\end{array}
\end{equation*}
Thus, as long as \(\psi_0 \neq 0 \) (which means \(v^{(0)} \) must have a component in the direction of \(x_0 \)) this method will eventually yield a vector in the direction of \(x_0 \text{.}\) However, this time the problem is that we don't know \(\lambda_0 \) when we start.
The following algorithm, known as the Power Method, avoids the problem of \(v^{(k)} \) growing or shrinking in length without requiring \(\lambda_0 \) to be known, by scaling it to be of unit length at each step:
\begin{equation*}
\begin{array}{l}
\mbox{Pick } v^{(0)} \mbox{ of unit length} \\
{\bf for~} k = 0, \ldots \\
~~~v^{(k+1)} = A v^{(k)} \\
~~~ v^{(k+1)} = v^{(k+1)} / \| v^{(k+1)} \| \\
{\bf endfor}
\end{array}
\end{equation*}
The idea is that we are only interested in the direction of the eigenvector, and hence it is convenient to rescale the vector to have unit length at each step.
Let \(x \) be an eigenvector of \(A \) and \(\lambda \) the associated eigenvalue. Then \(A x = \lambda x \text{.}\) Multiplying on the left by \(x^H \) yields \(x^H A x = \lambda x^H x \) which, since \(x \neq 0 \) means that \(\lambda = x^H A x / ( x^H x ) \text{.}\)
If \(x \) is an approximation of the eigenvector \(x_0 \) associated with \(\lambda_0 \text{,}\) then its Rayleigh quotient is an approximation to \(\lambda_0 \text{.}\)