Subsection 2.3.1 The Singular Value Decomposition Theorem
¶The following is probably the most important result in linear algebra:
Theorem 2.3.1.1. Singular Value Decomposition Theorem.
Given \(A \in \C^{m \times n} \) there exist unitary \(U \in \C^{ m \times m} \text{,}\) unitary \(V \in \C^{n \times n} \text{,}\) and \(\Sigma \in \Rmxn \) such that \(A = U \Sigma V^H \text{.}\) Here
and \(\sigma_0 \geq \sigma_1 \geq \cdots \geq \sigma_{r-1} \gt 0\text{.}\) The values \(\sigma_0, \ldots, \sigma_{r-1} \) are called the singular values of matrix \(A \text{.}\) The columns of \(U \) and \(V \) are called the left and right singular vectors, respectively.
Recall that in our notation a \(0 \) indicates a matrix of vector "of appropriate size" and that in this setting the zero matrices in (2.3.1) may be \(0 \times 0 \text{,}\) \((m-r) \times 0 \text{,}\) and/or \(0 \times (n-r) \text{.}\)
Before proving this theorem, we are going to put some intermediate results in place.
Remark 2.3.1.2.
As the course progresses, we will notice that there is a conflict between the notation that explicitly exposes indices, e.g.,
and the notation we use to hide such explicit indexing, which we call the FLAME notation, e.g.,
The two linked by
In algorithms that use explicit indexing, \(k \) often is the loop index that identifies where in the matrix or vector the algorithm currently has reached. In the FLAME notation, the index \(1 \) identifies that place. This creates a conflict for the two distinct items that are both indexed with \(1 \text{,}\) e.g., \(u_1 \) in our example here. It is our experience that learners quickly adapt to this and hence have not tried to introduce even more notation that avoids this conflict. In other words: you will almost always be able to tell from context what is meant. The following lemma and its proof illustrate this further.
Lemma 2.3.1.3.
Given \(A \in \C^{m \times n} \text{,}\) with \(1 \leq n \leq m \) and \(A \neq 0 \) (the zero matrix), there exist unitary matrices \(\widetilde U \in \C^{ m \times m} \) and \(\widetilde V \in \C^{n \times n} \) such that
Proof.
In the below proof, it is really imporant to keep track of when a line is part of the partitioning of a matrix or vector, and when it denotes scalar division.
Choose \(\sigma_1 \) and \(\widetilde v_1 \in \C^n \) such that
\(\| \widetilde v_1 \|_2 = 1 \text{;}\) and
\(\sigma_1 = \| A \widetilde v_1 \|_2 = \| A \|_2 \text{.}\)
In other words, \(\widetilde v_1 \) is the vector that maximizes \(\max_{\| x \|_2 = 1} \| A x \|_2 \text{.}\)
Let \(\widetilde u_1 = A \widetilde v_1 / \sigma_1 \text{.}\) Then
Choose \(\widetilde U_2 \in \C^{m \times (m-1)} \) and \(\widetilde V_2 \in \C^{n \times (n-1)} \) so that
are unitary. Then
where \(w = \widetilde V_2^H A^H \widetilde u_1 \) and \(B = \widetilde U_2^H A \widetilde V_2 \text{.}\)
We will now argue that \(w = 0 \text{,}\) the zero vector of appropriate size:
Thus \(\sigma_1^2 \geq \sigma_1^2 + w^H w \) which means that \(w = 0 \) (the zero vector) and \(\widetilde U^H A \widetilde V = \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right)\) so that \(A = \widetilde U \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right) \widetilde V^H \text{.}\)
Hopefully you can see where this is going: If one can recursively find that \(B = U_B \Sigma_B V_B^H \text{,}\) then
The next exercise provides the insight that the values on the diagonal of \(\Sigma \) will be ordered from largest to smallest.
Homework 2.3.1.1.
Let \(A \in \C^{m \times n} \) with \(A = \left( \begin{array}{ c | c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right) \) and assume that \(\| A \|_2 = \sigma_1 \text{.}\)
ALWAYS/SOMETIMES/NEVER: \(\| B \|_2 \leq \sigma_1 \text{.}\)
We will employ a proof by contradiction. Assume that \(\| B \|_2 > \sigma_1 \text{.}\) Then there exists a vector \(z \) with \(\| z \|_2 = 1 \) such that \(\| B \|_2 = \| B z \|_2 = \max_{\| x \|_2 = 1} \| B x \|_2 \text{.}\) But then
which is a contradiction.
Hence \(\| B \|_2 \leq \sigma_1 \text{.}\)
We are now ready to prove the Singular Value Decomposition Theorem.
Proof of Singular Value Decomposition Theorem for \(n \leq m \).
We will prove this for \(m \geq n \text{,}\) leaving the case where \(m \leq n \) as an exercise.
Proof by induction: Since \(m \geq n \text{,}\) we select \(m \) to be arbritary and induct on \(n \text{.}\)
-
Base case: \(n = 1 \text{.}\)
In this case \(A = \left( \begin{array}{c} a_1 \end{array} \right) \) where \(a_1 \in \Cm \) is its only column.
Case 1: \(a_1 = 0 \) (the zero vector).
Then
\begin{equation*} A = \left( \begin{array}{c} 0 \end{array} \right) = \begin{array}[t]{c} \underbrace{ I_{m \times m} } \\ U \end{array} \left( \begin{array}{ c | c } ~ \amp ~ \\ \hline ~ \amp 0 \end{array} \right) \begin{array}[t]{c} \underbrace{ I_{1 \times 1} } \\ V^H \end{array} \end{equation*}so that \(U = I_{m \times m} \text{,}\) \(V = I_{1 \times 1} \text{,}\) and \(\Sigma_{TL} \) is an empty matrix.
Case 2: \(a_1 \neq 0 \text{.}\)
Then
\begin{equation*} A = \left( \begin{array}{c} a_1 \end{array} \right) = \left( \begin{array}{c} u_1 \end{array} \right) \left( \| a_1 \|_2 \right) \end{equation*}where \(u_1 = a_1 / \| a_1 \|_2 \text{.}\) Choose \(U_2 \in \C^{m \times (m-1)} \) so that \(U = \left( \begin{array}{c | c} u_1 \amp U_2 \end{array} \right)\) is unitary. Then
\begin{equation*} \begin{array}{rcl} A \amp = \amp \left( \begin{array}{c} a_1 \end{array} \right) \\ \amp = \amp \left( \begin{array}{c} u_1 \end{array} \right) ( \| a_1 \|_2 ) \\ \amp = \amp \left( \begin{array}{c| c} u_1 \amp U_2 \end{array} \right) \left( \begin{array}{c | c } \| a_1 \|_2 \amp ~\\ \hline 0 \amp ~ \end{array} \right) \left( \begin{array}{c} 1 \end{array} \right)^H \\ \amp = \amp U \Sigma V^H, \end{array} \end{equation*}where
\(U = \left( \begin{array}{c| c} u_0 \amp U_1 \end{array} \right) \text{,}\)
\(\Sigma = \left( \begin{array}{ c | c} \Sigma_{TL} \amp ~ \\ \hline 0 \amp ~ \end{array} \right) \) with \(\Sigma_{TL} = \left( \begin{array}{c} \sigma_1 \end{array} \right) \) and \(\sigma_1 = \| a_1 \|_2 = \| A \|_2 \)
\(V = \left( \begin{array}{c} 1 \end{array} \right) \text{.}\)
-
Inductive step:
Assume the result is true for matrices with \(1 \leq k \) columns. Show that it is true for matrices with \(k+1 \) columns.
Let \(A \in \C^{m \times (k+1)} \) with \(1 \leq k \lt n \text{.}\)
Case 1: \(A = 0 \) (the zero matrix)
Then
\begin{equation*} A = I_{m \times m} \left( \begin{array}{c |c } ~ \amp ~ \\ \hline 0_{m \times (k+1)} \amp ~ \end{array} \right) I_{(k+1) \times (k+1)} \end{equation*}so that \(U = I_{m \times m} \text{,}\) \(V = I_{(k+1) \times (k+1)} \text{,}\) and \(\Sigma_{TL} \) is an empty matrix.
Case 2: \(A \neq 0 \text{.}\)
Then \(\| A \|_2 \neq 0 \text{.}\) By Lemma 2.3.1.3, we know that there exist unitary \(\widetilde U \in \C^{m \times m} \) and \(\widetilde V \in \C^{(k+1) \times (k+1)} \) such that \(A = \widetilde U \left( \begin{array}{c|c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right)\widetilde V \) with \(\sigma_1 = \| A \|_2 \text{.}\)
By the inductive hypothesis, there exist unitary \(\check U_B \in \C^{ (m-1) \times (m-1)} \text{,}\) unitary \(\check V_B \in \C^{k \times k} \text{,}\) and \(\check \Sigma_B \in \R^{(m-1) \times k}\) such that \(B = \check U_B \check \Sigma_B \check V_B^H \) where \(\check \Sigma_B = \FlaTwoByTwo{ \check \Sigma_{TL} }{ 0 }{ 0 }{ 0 } \text{,}\) \(\check \Sigma_{TL} = \diag{ \sigma_2, \cdots , \sigma_{r-1} } \text{,}\) and \(\sigma_2 \geq \cdots \geq \sigma_{r-1} \gt 0 \text{.}\)
Now, let
\begin{equation*} U = \widetilde U \left( \begin{array}{c | c} 1 \amp 0 \\ \hline 0 \amp \check U_B \end{array} \right), V = \widetilde V \left( \begin{array}{c | c} 1 \amp 0 \\ \hline 0 \amp \check V_B \end{array} \right), \mbox{ and } \Sigma = \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp \check \Sigma_B \end{array} \right). \end{equation*}(There are some really tough to see "checks" in the definition of \(U \text{,}\) \(V \text{,}\) and \(\Sigma \text{!!}\)) Then \(A = U \Sigma V^H \) where \(U \text{,}\) \(V \text{,}\) and \(\Sigma\) have the desired properties. Key here is that \(\sigma_1 = \| A \|_2 \geq \| B \|_2 \) which means that \(\sigma_1 \geq \sigma_2 \text{.}\)
By the Principle of Mathematical Induction the result holds for all matrices \(A \in \C^{m \times n} \) with \(m \geq n \text{.}\)
Homework 2.3.1.2.
Let \(\Sigma = \diag{\sigma_0, \ldots, \sigma_{n-1}} \text{.}\) ALWAYS/SOMETIMES/NEVER: \(\| \Sigma \|_2 = \max_{i=0}^{n-1} \vert \sigma_i \vert \text{.}\)
ALWAYS
Now prove it.
Yes, you have seen this before, in Homework 1.3.5.1. We repeat it here because of its importance to this topic.
so that \(\| \Sigma \|_2 \leq \max_{i=0}^{n-1} \vert \sigma_i \vert \text{.}\)
Also, choose \(j \) so that \(\vert \sigma_j \vert = \max_{i=0}^{n-1} \vert \sigma_i \vert \text{.}\) Then
so that \(\max_{i=0}^{n-1} \vert \sigma_i \vert \leq \| \Sigma \|_2 \leq \max_{i=0}^{n-1} \vert \sigma_i \vert \text{,}\) which implies that \(\| \Sigma \|_2 = \max_{i=0}^{n-1} \vert \sigma_i \vert \text{.}\)
Homework 2.3.1.3.
Assume that \(U \in \C^{m \times m} \) and \(V \in \C^{n \times n} \) are unitary matrices. Let \(A, B \in \C^{m \times n} \) with \(B = U A V^H \text{.}\) Show that the singular values of \(A \) equal the singular values of \(B \text{.}\)
Let \(A = U_A \Sigma_A V_A^H\) be the SVD of \(A \text{.}\) Then \(B = U U_A \Sigma_A V_A^H V^H = ( U U_A ) \Sigma_A (V V_A)^H \) where both \(U U_A \) and \(V V_A \) are unitary. This gives us the SVD for \(B \) and it shows that the singular values of \(B \) equal the singular values of \(A \text{.}\)
Homework 2.3.1.4.
Let \(A \in \C^{m \times n} \) with \(n \leq m \) and \(A = U \Sigma V^H \) be its SVD.
ALWAYS/SOMETIMES/NEVER: \(A^H = V \Sigma^T U^H \text{.}\)
Homework 2.3.1.5.
Prove the Singular Value Decomposition Theorem for \(m \leq n \text{.}\)
Consider the SVD of \(B = A^H \)
Let \(B = A^H \text{.}\) Since it is \(n \times m \) with \(n \geq m \) its SVD exists: \(B = U_B \Sigma_B V_B^H \text{.}\) Then \(A = B^H = V_B \Sigma_B^T U_B^H \) and hence \(A = U \Sigma V^H \) with \(U = V_B \text{,}\) \(\Sigma = \Sigma_B^T \text{,}\) and \(V = U_B \text{.}\)
I believe the following video has material that is better presented in second video of 2.3.2.