ALAFF Conditioning of the linear least squares problem

Subsection 4.2.4 Conditioning of the linear least squares problem

Given \(A \in \Cmxn \) with linearly independent columns and \(b \in \Cm \text{,}\) consider the linear least squares (LLS) problem

\begin{equation} \| b - A \widehat x \|_2 = \min_x \| b - A x \|_2 \label{eqn-LLS1}\tag{4.2.1} \end{equation}

and the perturbed problem

\begin{equation} \| (b + \delta\!b ) - A (\widehat x+\delta\! \widehat x ) \|_2 = \min_{x} \| ( b + \delta\!b ) - A (x+\delta\!x ) \|_2 \label{eqn-LLS2}\tag{4.2.2} \end{equation}

The question we want to examine is by how much the relative error in \(b \) is amplified into a relative error in \(\widehat x \text{.}\) We will restrict our discussion to the case where \(A \) has linearly independent columns.

Now, we discovered that \(\widehat b \text{,}\) the projection of \(b \) onto the column space of \(A \text{,}\) satisfies

\begin{equation} \widehat b = A \widehat x \label{eqn-LLS-cond1}\tag{4.2.3} \end{equation}

and the projection of \(b + \delta\! b \) satisfies

\begin{equation} \widehat b + \delta\! \widehat b = A ( \widehat x + \delta\! \widehat x ) \label{eqn-LLS-cond2}\tag{4.2.4} \end{equation}

where \(\delta\! \widehat b \) equals the projection of \(\delta\!b \) onto the column space of \(A \text{.}\)

Let \(\theta \) equal the angle between vectors \(b \) and its projection \(\widehat b \) (which equals the angle between \(b \) and the column space of \(A \)). Then

\begin{equation*} \cos(\theta) = \| \widehat b \|_2 / \| b \|_2 \end{equation*}

and hence

\begin{equation*} \cos(\theta) \| b \|_2 = \| \widehat b \|_2 = \| A \widehat x \|_2 \leq \| A \|_2 \| \widehat x \|_2 = \sigma_0 \| \widehat x \|_2 \end{equation*}

which (as long as \(\widehat x \neq 0 \)) can be rewritten as

\begin{equation} \frac{1}{\| \widehat x \|_2} \leq \frac{\sigma_0}{\cos( \theta )} \frac{1}{\| b \|_2}. \label{eqn-LLS-cond3}\tag{4.2.5} \end{equation}

Subtracting (4.2.3) from (4.2.4) yields

\begin{equation*} \delta\! \widehat b = A \delta\! \widehat x \end{equation*}

or, equivalently,

\begin{equation*} A \delta\! \widehat x = \delta\! \widehat b \end{equation*}

which is solved by

\begin{equation*} \begin{array}{rcl} \delta\! \widehat x \amp = \amp A^\dagger \delta\! \widehat b \\ \amp = \amp A^\dagger A ( A^H A )^{-1} A^H \delta b \\ \amp = \amp ( A^H A )^{-1} A^H A ( A^H A )^{-1} A^H \delta b \\ \amp = \amp A^\dagger \delta\! b , \end{array} \end{equation*}

where \(A^\dagger = ( A^H A )^{-1} A^H \) is the pseudo inverse of \(A \) and we recall that \(\delta\! \widehat b = A ( A^H A )^{-1} A^H \delta b \text{.}\) Hence

\begin{equation} \| \delta\! \widehat x \|_2 \leq \| A^\dagger \|_2 \| \delta\! b \|_2.\label{eqn-LLS-cond4}\tag{4.2.6} \end{equation}

Homework 4.2.4.1.

Let \(A \in \Cmxn \) have linearly independent columns. Show that

\begin{equation*} \| ( A^H A )^{-1} A^H \|_2 = 1/\sigma_{n-1}, \end{equation*}

where \(\sigma_{n-1} \) equals the smallest singular value of \(A \text{.}\)

Hint

Use the reduced SVD of \(A \text{.}\)

Solution

Let \(A = U_L \Sigma_{TL} V^H\) be the reduced SVD of \(A \text{,}\) where \(V \) is square because \(A \) has linearly independent columns. Then

\begin{equation*} \begin{array}{l} \| ( A^H A )^{-1} A^H \|_2 \\ ~~~=~~~~\\ \| ( ( U_L \Sigma_{TL} V^H )^H U_L \Sigma_{TL} V^H )^{-1} ( U_L \Sigma_{TL} V^H )^H \|_2 \\ ~~~=~~~~\\ \| ( V \Sigma_{TL} U_L^H U_L \Sigma_{TL} V^H )^{-1} V \Sigma_{TL} U_L^H \|_2 \\ ~~~=~~~~\\ \| ( V \Sigma_{TL}^{-1} \Sigma_{TL}^{-1} V^H ) V \Sigma_{TL} U_L^H \|_2 \\ ~~~=~~~~\\ \| V \Sigma_{TL}^{-1} U_L^H \|_2 \\ ~~~=~~~~\\ \| \Sigma_{TL}^{-1} U_L^H \|_2 \\ ~~~=~~~\\ 1/\sigma_{n-1}. \end{array} \end{equation*}

This last step needs some more explanation: Clearly \(\| \Sigma_{TL} U_L^H \|_2 \leq \| \Sigma_{TL} \|_2 \| U_L^H \|_2 = \sigma_{0} \| U_L^H \|_2 \leq \sigma_0 \text{.}\) We need to show that there exists a vector \(x \) with \(\| x \|_2 = 1 \) such that \(\| \Sigma_{TL} U_L^H x \|_2 = \| \Sigma_{TL} U_L^H \|_2 \text{.}\) If we pick \(x = u_0 \) (the first column of \(U_L\)), then \(\| \Sigma_{TL} U_L^H x \|_2 = \| \Sigma_{TL} U_L^H u_0 \|_2 = \| \Sigma_{TL} e_0 \|_2 = \| \sigma_0 e_0 \|_2 = \sigma_0 \text{.}\)

Combining (4.2.5), (4.2.6), and the result in this last homework yields

\begin{equation} \frac{\| \delta\! \widehat x\|_2}{\| \widehat x \|_2} \leq \frac{1}{\cos( \theta )} \frac{\sigma_0}{\sigma_{n-1}} \frac{\| \delta\! b \|_2}{\| b \|_2}. \label{eqn-LLS-cond5}\tag{4.2.7} \end{equation}

Notice the effect of the \(\cos(\theta)b \text{.}\) If \(b\) is almost perpendicular to \({\cal C}(A) \text{,}\) then its projection \(\widehat b \) is small and \(\cos \theta \) is small. Hence a small relative change in \(b \) can be greatly amplified. This makes sense: if \(b \) is almost perpendical to \(\Col( A ) \text{,}\) then \(\widehat x \approx 0 \text{,}\) and any small \(\delta\!b \in \Col(A)\) can yield a relatively large change \(\delta\!x \text{.}\)

Definition 4.2.4.1. Condition number of matrix with linearly independent columns.

Let \(A \in \Cmxn \) have linearly independent columns (and hence \(n \leq m \)). Then its condition number (with respect to the 2-norm) is defined by

\begin{equation*} \kappa_2( A ) = \| A \|_2 \| A^\dagger \|_2 = \frac{\sigma_0}{\sigma_{n-1}}. \end{equation*}

It is informative to explicity expose \(\cos( \theta ) = \| \widehat b \|_2/ \| b \|_2 \) in (4.2.7):

\begin{equation*} \frac{\| \delta\! \widehat x\|_2}{\| \widehat x \|_2} \leq \frac{\| b \|_2}{\| \widehat b \|_2} \frac{\sigma_0}{\sigma_{n-1}} \frac{\| \delta\! b \|_2}{\| b \|_2}. \end{equation*}

Notice that the ratio

\begin{equation*} \frac{\| \delta\! b \|_2}{\| b \|_2} \end{equation*}

can be made smaller by adding a component, \(b_r \text{,}\) to \(b \) that is orthogonal to \(\Col( A ) \) (and hence does not change the projection onto the column space, \(\widehat b \)):

\begin{equation*} \frac{\| \delta\! b \|_2}{\| b + b_r \|_2}. \end{equation*}

The factor \(1/\cos( \theta ) \) ensures that this does not magically reduce the relative error in \(\widehat x \text{:}\)

\begin{equation*} \frac{\| \delta\! \widehat x\|_2}{\| \widehat x \|_2} \leq \frac{\| b + b_r \|_2}{\| \widehat b \|_2} \frac{\sigma_0}{\sigma_{n-1}} \frac{\| \delta\! b \|_2}{\| b + b_r \|_2}. \end{equation*}