One can think of the Frobenius norm as taking the columns of the matrix, stacking them on top of each other to create a vector of size \(m \times n \text{,}\) and then taking the vector 2-norm of the result.
Homework1.3.3.1.
Partition \(m \times n \) matrix \(A \) by columns:
\begin{equation*}
A = \left( \begin{array}{c | c | c}
a_0 \amp \cdots \amp a_{n-1}
\end{array} \right).
\end{equation*}
Show that
\begin{equation*}
\| A \|_F^2 = \sum_{j=0}^{n-1}
\| a_j \|_2^2.
\end{equation*}
Establishing that this function is positive definite and homogeneous is straight forward. To show that the triangle inequality holds it helps to realize that if \(A = \left( \begin{array}{c | c | c
| c} a_0 \amp a_1 \amp \cdots \amp a_{n-1} \end{array} \right) \) then
In other words, it equals the vector 2-norm of the vector that is created by stacking the columns of \(A \) on top of each other. One can then exploit the fact that the vector 2-norm obeys the triangle inequality.
Let us review the definition of the transpose of a matrix (which we have already used when defining the dot product of two real-valued vectors and when identifying a row in a matrix):
where \(\overline A\) denotes the conjugate of a matrix, in which each element of the matrix is conjugated.
We note that
\(\overline A^T = \overline{ A^T } \text{.}\)
If \(A \in \mathbb R^{m \times n} \text{,}\) then \(A^H = A^T \text{.}\)
If \(x \in \Cm \text{,}\) then \(x^H \) is defined consistent with how we have used it before.
If \(\alpha \in \mathbb C \text{,}\) then \(\alpha^H =
\overline \alpha \text{.}\)
(If you view the scalar as a matrix and then Hermitian transpose it, you get the matrix with as only element \(\overline \alpha \text{.}\))
Don't Panic!. While working with complex-valued scalars, vectors, and matrices may appear a bit scary at first, you will soon notice that it is not really much more complicated than working with their real-valued counterparts.
Homework1.3.3.4.
Let \(A \in \C^{m \times k} \) and \(B \in \C^{k
\times n} \text{.}\) Using what you once learned about matrix transposition and matrix-matrix multiplication, reason that \((A B)^H = B^H A^H \text{.}\)
Similarly, other matrix norms can be created from vector norms by viewing the matrix as a vector. It turns out that, other than the Frobenius norm, these aren't particularly interesting in practice. An example can be found in Homework 1.6.1.6.
Remark1.3.3.5.
The Frobenius norm of a \(m \times n \) matrix is easy to compute (requiring \(O( m n ) \) computations). The functions \(f( A ) = \| A \|_F \) and \(f( A ) = \| A \|_F^2 \) are also differentiable. However, you'd be hard-pressed to find a meaningful way of linking the definition of the Frobenius norm to a measure of an underlying linear transformation (other than by first transforming that linear transformation into a matrix).