Subsection 11.2.2 A strategy for computing the SVD
¶Remark 11.2.2.1.
In this section, we discuss both the QR factorization and the QR algorithm. The QR factorization, discussed in Week 3, is given by \(A = QR \text{.}\) The QR algorithm, which we discussed in Week 10, instead computes the Spectral Decomposition of a Hermitian matrix. It can be modified to compute the Schur Decomposition instead, which we don't discuss in this course. It can also be modified to compute the SVD of a matrix, which we discuss in this, and subsequent, units.
The first observation that leads to a practical algorithm is that matrices for which we wish to compute the SVD are often tall and skinny, by which we mean that they have many more rows than they have columns, and it is the Reduced SVD of this matrix that is desired. The methods we will develop for computing the SVD are based on the implicitly shifted QR algorithm that was discussed in Subsection 10.3.5, which requires \(O( n^3 ) \) computation when applied to an \(n \times n \) matrix. Importantly, the leading \(n^3 \) term has a very large constant relative to, say, the cost of a QR factorization of that same matrix.
Rather than modifying the QR algorithm to work with a tall and skinny matrix, we start by computing its QR factorization, \(A = Q R \text{.}\) After this, the SVD of the smaller, \(n \times n \) sized, matrix \(R \) is computed. The following homework shows how the Reduced SVD of \(A \) can be extracted from \(Q \) and the SVD of \(R \text{.}\)
Homework 11.2.2.1.
Let \(A \in \mathbb C^{m \times n} \text{,}\) with \(m \geq n \text{,}\) and \(A = Q R\) be its QR factorization where, for simplicity, we assume that \(n \times n \) upper triangular matrix \(R \) is nonsingular. If \(R = \widehat U \widehat \Sigma \widehat V^H \) is the SVD of \(R \text{,}\) give the Reduced SVD of \(A \text{.}\)
which tells us the Reduced SVD \(A = U_L \Sigma_{TL} V_L^T\) where \(U_L = Q \widehat U \text{,}\) \(\Sigma_{TL} = \widehat \Sigma \text{,}\) and \(V_L = \widehat V \text{.}\)
While it would be nice if the upper triangular structure of \(R \) was helpful in computing its SVD, it is actually the fact that that matrix is square and small (if \(n \ll m\)) that is significant. For this reason, we now assume that we are interested in finding the SVD of a square matrix \(A \text{,}\) and ignore the fact that that matrix may be triangular.
Here are some more observations, details of which will become clear in the next units:
-
In Subsection 10.3.1, we saw that an \(m \times m \) Hermitian matrix can be reduced to tridiagonal form via a sequence of Householder transformations that are applied from the left and the right to the matrix. This then greatly reduced the cost of the QR algorithm that was used to compute the Spectral Decomposition.
In the next unit, we will see that one can similarly reduce a matrix to bidiagonal form, a matrix that has only a nonzero diagonal and super diagonal. In other words, there is a similarity transformation such that
\begin{equation*} A = Q_A B Q_A^H, \end{equation*}where \(B \) is bidiagonal. Conveniently, \(B \) can also be forced to be real-valued.
The observation now is that \(B^T B \) is a real-valued tridiagonal matrix. Thus, if we explicitly form \(T = B^T B \text{,}\) then we can employ the implicitly shifted QR algorithm (or any other tridiagonal eigensolver) to compute its Spectral Decomposition and from that construct the SVD of \(B\text{,}\) the SVD of the square matrix \(A \text{,}\) and the Reduced SVD of whatever original \(m \times n \) matrix we started with.
We don't want to explicitly form \(B^T B \) because the condition number of \(B \) equals the condition number of the original problem (since they are related via unitary transformations).
In the next units, we will find that we can again employ the Implicit Q Theorem to compute the SVD of \(B \text{,}\) inspired by the implicitly shifted QR algorithm. The algorithm we develop again casts all updates to \(B\) in terms of unitary transformations, yielding a highly accurate algorithm.
Putting these observations together yields a practical methodology for computing the Reduced SVD of a matrix.