ALAFF The Frobenius norm

Subsection 1.3.3 The Frobenius norm

Definition 1.3.3.1. The Frobenius norm.

The Frobenius norm \(\| \cdot \|_F : \C^{m \times n} \rightarrow \mathbb R \) is defined for \(A \in \C^{m \times n} \) by

\begin{equation*} \| A \|_F = \sqrt{ \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} \vert \alpha_{i,j} \vert^2 } = \sqrt{ \begin{array}{c c c c c c} \vert \alpha_{0,0} \vert^2 \amp + \amp \cdots \amp + \amp \vert \alpha_{0,n-1} \vert^2 \amp + \\ \vdots \amp \vdots \amp \amp \vdots \amp \vdots \amp \vdots \\ \vert \alpha_{m-1,0} \vert^2 \amp + \amp \cdots \amp + \amp \vert \alpha_{m-1,n-1} \vert^2 . \end{array} } \end{equation*}

One can think of the Frobenius norm as taking the columns of the matrix, stacking them on top of each other to create a vector of size \(m \times n \text{,}\) and then taking the vector 2-norm of the result.

Homework 1.3.3.1.

Partition \(m \times n \) matrix \(A \) by columns:

\begin{equation*} A = \left( \begin{array}{c | c | c} a_0 \amp \cdots \amp a_{n-1} \end{array} \right). \end{equation*}

Show that

\begin{equation*} \| A \|_F^2 = \sum_{j=0}^{n-1} \| a_j \|_2^2. \end{equation*}

Solution

Homework 1.3.3.2.

Prove that the Frobenius norm is a norm.

Solution

Establishing that this function is positive definite and homogeneous is straight forward. To show that the triangle inequality holds it helps to realize that if \(A = \left( \begin{array}{c | c | c | c} a_0 \amp a_1 \amp \cdots \amp a_{n-1} \end{array} \right) \) then

\begin{equation*} \begin{array}{l} \| A \|_F \\ ~~~=~~~~ \lt \mbox{ definition} \gt \\ \sqrt{ \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} \vert \alpha_{i,j} \vert^2 } \\ ~~~=~~~~ \lt \mbox{ commutativity of addition} \gt \\ \sqrt{ \sum_{j=0}^{n-1} \sum_{i=0}^{m-1} \vert \alpha_{i,j} \vert^2 } \\ ~~~=~~~~ \lt \mbox{ definition of vector 2-norm } \gt \\ \sqrt{ \sum_{j=0}^{n-1} \| a_j \|_2^2 } \\ ~~~=~~~~ \lt \mbox{ definition of vector 2-norm } \gt \\ \sqrt{ \left\| \left( \begin{array}{c} a_0 \\ a_1 \\ \vdots \\ a_{n-1} \end{array} \right) \right\|_2^2 }. \end{array} \end{equation*}

In other words, it equals the vector 2-norm of the vector that is created by stacking the columns of \(A \) on top of each other. One can then exploit the fact that the vector 2-norm obeys the triangle inequality.

Homework 1.3.3.3.

Partition \(m \times n \) matrix \(A \) by rows:

\begin{equation*} A = \left( \begin{array}{c} \widetilde a_0^T \\ \hline \vdots \\ \hline \widetilde a_{m-1}^T \end{array} \right). \end{equation*}

Show that

\begin{equation*} \| A \|_F^2 = \sum_{i=0}^{m-1} \| \widetilde a_i \|_2^2, \end{equation*}

where \(\widetilde a_i = {\widetilde a_i^T~}^T \text{.}\)

Solution

Let us review the definition of the transpose of a matrix (which we have already used when defining the dot product of two real-valued vectors and when identifying a row in a matrix):

Definition 1.3.3.2. Transpose.

If \(A \in \mathbb C^{m \times n} \) and

\begin{equation*} A = \left( \begin{array}{c| c | c | c} \alpha_{0,0} \amp \alpha_{0,1} \amp \cdots \amp \alpha_{0,n-1} \\ \hline \alpha_{1,0} \amp \alpha_{1,1} \amp \cdots \amp \alpha_{1,n-1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \vdots \\ \hline \alpha_{m-1,0} \amp \alpha_{m-1,1} \amp \cdots \amp \alpha_{m-1,n-1} \ \end{array} \right) \end{equation*}

then its transpose is defined by

\begin{equation*} A^T = \left( \begin{array}{c| c | c | c} \alpha_{0,0} \amp \alpha_{1,0} \amp \cdots \amp \alpha_{m-1,0} \\ \hline \alpha_{0, 1} \amp \alpha_{1, 1} \amp \cdots \amp \alpha_{m-1, 1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \vdots \\ \hline \alpha_{0, n-1} \amp \alpha_{1, n-1} \amp \cdots \amp \alpha_{m-1, n-1} \ \end{array} \right) . \end{equation*}

For complex-valued matrices, it is important to also define the Hermitian transpose of a matrix:

Definition 1.3.3.3. Hermitian transpose.

If \(A \in \mathbb C^{m \times n} \) and

then its Hermitian transpose is defined by

\begin{equation*} A^H = \overline A^T \left( \begin{array}{c| c | c | c} \overline \alpha_{0,0} \amp \overline \alpha_{1,0} \amp \cdots \amp \overline \alpha_{m-1,0} \\ \hline \overline \alpha_{0, 1} \amp \overline \alpha_{1, 1} \amp \cdots \amp \overline \alpha_{m-1, 1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \vdots \\ \hline \overline \alpha_{0, n-1} \amp \overline \alpha_{1, n-1} \amp \cdots \amp \overline \alpha_{m-1, n-1} \ \end{array} \right), \end{equation*}

where \(\overline A\) denotes the conjugate of a matrix, in which each element of the matrix is conjugated.

We note that

\(\overline A^T = \overline{ A^T } \text{.}\)
If \(A \in \mathbb R^{m \times n} \text{,}\) then \(A^H = A^T \text{.}\)
If \(x \in \Cm \text{,}\) then \(x^H \) is defined consistent with how we have used it before.
If \(\alpha \in \mathbb C \text{,}\) then \(\alpha^H = \overline \alpha \text{.}\)

(If you view the scalar as a matrix and then Hermitian transpose it, you get the matrix with as only element \(\overline \alpha \text{.}\))

Don't Panic!. While working with complex-valued scalars, vectors, and matrices may appear a bit scary at first, you will soon notice that it is not really much more complicated than working with their real-valued counterparts.

Homework 1.3.3.4.

Let \(A \in \C^{m \times k} \) and \(B \in \C^{k \times n} \text{.}\) Using what you once learned about matrix transposition and matrix-matrix multiplication, reason that \((A B)^H = B^H A^H \text{.}\)

Solution

\begin{equation*} \begin{array}{l} ( A B )^H \\ ~~~ = ~~~~ \lt X^H = \overline{X^T} \gt \\ \overline{ (A B)^T } \\ ~~~ = ~~~~ \lt \mbox{ you once discovered that } ( A B )^T = B^T A^T \gt \\ \overline{ B^T A^T } \\ ~~~ = ~~~~ \lt \mbox{ you may check separately that } \overline{ XY} = \overline X \overline Y \gt \\ \overline{ B^T } ~~ \overline{ A^T } \\ ~~~ = ~~~~ \lt \overline {X^T} = \overline X^T \gt \\ B^H A^H \end{array} \end{equation*}

Definition 1.3.3.4. Hermitian.

A matrix \(A \in \mathbb C^{m \times m} \) is Hermitian if and only if \(A = A^H \text{.}\)

Obviously, if \(A \in \mathbb R^{m \times m} \text{,}\) then \(A \) is a Hermitian matrix if and only if \(A \) is a symmetric matrix.

Homework 1.3.3.5.

Let \(A \in \mathbb C^{m \times n} \text{.}\)

ALWAYS/SOMETIMES/NEVER: \(\| A^H \|_F = \| A \|_F\text{.}\)

Answer

ALWAYS

Solution

\begin{equation*} \begin{array}{l} \| A \|_F \\ ~~~=~~~~ \lt \mbox{ definition} \gt \\ \sqrt{ \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} \vert \alpha_{i,j} \vert^2 } \\ ~~~=~~~~ \lt \mbox{ commutativity~of~addition} \gt \\ \sqrt{ \sum_{j=0}^{n-1} \sum_{i=0}^{m-1} \vert \alpha_{i,j} \vert^2 } \\ ~~~=~~~~ \lt \mbox{ change of variables} \gt \\ \sqrt{ \sum_{i=0}^{n-1} \sum_{j=0}^{m-1} \vert \alpha_{j,i} \vert^2 } \\ ~~~=~~~~ \lt \mbox{ algebra} \gt \\ \sqrt{ \sum_{i=0}^{n-1} \sum_{j=0}^{m-1} \vert \overline{\alpha_{j,i}} \vert^2 } \\ ~~~=~~~~ \lt \mbox{ definition} \gt \\ \| A^H \|_F \end{array} \end{equation*}

Similarly, other matrix norms can be created from vector norms by viewing the matrix as a vector. It turns out that, other than the Frobenius norm, these aren't particularly interesting in practice. An example can be found in Homework 1.6.1.6.

Remark 1.3.3.5.

The Frobenius norm of a \(m \times n \) matrix is easy to compute (requiring \(O( m n ) \) computations). The functions \(f( A ) = \| A \|_F \) and \(f( A ) = \| A \|_F^2 \) are also differentiable. However, you'd be hard-pressed to find a meaningful way of linking the definition of the Frobenius norm to a measure of an underlying linear transformation (other than by first transforming that linear transformation into a matrix).