Subsection 4.2.4 Conditioning of the linear least squares problem
ΒΆGiven \(A \in \Cmxn \) with linearly independent columns and \(b \in \Cm \text{,}\) consider the linear least squares (LLS) problem
and the perturbed problem
The question we want to examine is by how much the relative error in \(b \) is amplified into a relative error in \(\widehat x \text{.}\) We will restrict our discussion to the case where \(A \) has linearly independent columns.
Now, we discovered that \(\widehat b \text{,}\) the projection of \(b \) onto the column space of \(A \text{,}\) satisfies
and the projection of \(b + \delta\! b \) satisfies
where \(\delta\! \widehat b \) equals the projection of \(\delta\!b \) onto the column space of \(A \text{.}\)
Let \(\theta \) equal the angle between vectors \(b \) and its projection \(\widehat b \) (which equals the angle between \(b \) and the column space of \(A \)). Then
and hence
which (as long as \(\widehat x \neq 0 \)) can be rewritten as
Subtracting (4.2.3) from (4.2.4) yields
or, equivalently,
which is solved by
where \(A^\dagger = ( A^H A )^{-1} A^H \) is the pseudo inverse of \(A \) and we recall that \(\delta\! \widehat b = A ( A^H A )^{-1} A^H \delta b \text{.}\) Hence
Homework 4.2.4.1.
Let \(A \in \Cmxn \) have linearly independent columns. Show that
where \(\sigma_{n-1} \) equals the smallest singular value of \(A \text{.}\)
Use the reduced SVD of \(A \text{.}\)
Let \(A = U_L \Sigma_{TL} V^H\) be the reduced SVD of \(A \text{,}\) where \(V \) is square because \(A \) has linearly independent columns. Then
This last step needs some more explanation: Clearly \(\| \Sigma_{TL} U_L^H \|_2 \leq \| \Sigma_{TL} \|_2 \| U_L^H \|_2 = \sigma_{0} \| U_L^H \|_2 \leq \sigma_0 \text{.}\) We need to show that there exists a vector \(x \) with \(\| x \|_2 = 1 \) such that \(\| \Sigma_{TL} U_L^H x \|_2 = \| \Sigma_{TL} U_L^H \|_2 \text{.}\) If we pick \(x = u_0 \) (the first column of \(U_L\)), then \(\| \Sigma_{TL} U_L^H x \|_2 = \| \Sigma_{TL} U_L^H u_0 \|_2 = \| \Sigma_{TL} e_0 \|_2 = \| \sigma_0 e_0 \|_2 = \sigma_0 \text{.}\)
Combining (4.2.5), (4.2.6), and the result in this last homework yields
Notice the effect of the \(\cos(\theta)b \text{.}\) If \(b\) is almost perpendicular to \({\cal C}(A) \text{,}\) then its projection \(\widehat b \) is small and \(\cos \theta \) is small. Hence a small relative change in \(b \) can be greatly amplified. This makes sense: if \(b \) is almost perpendical to \(\Col( A ) \text{,}\) then \(\widehat x \approx 0 \text{,}\) and any small \(\delta\!b \in \Col(A)\) can yield a relatively large change \(\delta\!x \text{.}\)
Definition 4.2.4.1. Condition number of matrix with linearly independent columns.
Let \(A \in \Cmxn \) have linearly independent columns (and hence \(n \leq m \)). Then its condition number (with respect to the 2-norm) is defined by
It is informative to explicity expose \(\cos( \theta ) = \| \widehat b \|_2/ \| b \|_2 \) in (4.2.7):
Notice that the ratio
can be made smaller by adding a component, \(b_r \text{,}\) to \(b \) that is orthogonal to \(\Col( A ) \) (and hence does not change the projection onto the column space, \(\widehat b \)):
The factor \(1/\cos( \theta ) \) ensures that this does not magically reduce the relative error in \(\widehat x \text{:}\)