Processing math: 100%
Skip to main content

Subsection 1.3.1 Of linear transformations and matrices

fit width

We briefly review the relationship between linear transformations and matrices, which is key to understanding why linear algebra is all about matrices and vectors.

Definition 1.3.1.1. Linear transformations and matrices.

Let L:CnCm. Then L is said to be a linear transformation if for all αC and x,yCn

  • L(αx)=αL(x). That is, scaling first and then transforming yields the same result as transforming first and then scaling.

  • L(x+y)=L(x)+L(y). That is, adding first and then transforming yields the same result as transforming first and then adding.

The importance of linear transformations comes in part from the fact that many problems in science boil down to, given a function F:CnCm and vector yCm, find x such that F(x)=y. This is known as an inverse problem. Under mild conditions, F can be locally approximated with a linear transformation L and then, as part of a solution method, one would want to solve Lx=y.

The following theorem provides the link between linear transformations and matrices:

A simple inductive proof yields the result. For details, see Week 2 of Linear Algebra: Foundations to Frontiers (LAFF) [26].

The following set of vectors ends up playing a crucial role throughout this course:

Definition 1.3.1.3. Standard basis vector.

In this course, we will use ejCm to denote the standard basis vector with a "1" in the position indexed with j. So,

ej=(00100)00j00

Key is the fact that any vector xCn can be written as a linear combination of the standard basis vectors of Cn:

x=(χ0χ1χn1)=χ0(100)+χ1(010)++χn1(001)=χ0e0+χ1e1++χn1en1.

Hence, if L is a linear transformation,

L(x)=L(χ0e0+χ1e1++χn1en1)=χ0L(e0)a0+χ1L(e1)a1++χn1L(en1)an1.

If we now let aj=L(ej) (the vector aj is the transformation of the standard basis vector ej and collect these vectors into a two-dimensional array of numbers:

A=(a0a1an1)

then we notice that information for evaluating L(x) can be found in this array, since L can then alternatively be computed by

L(x)=χ0a0+χ1a1++χn1an1.

The array A in (1.3.1) we call a matrix and the operation Ax=χ0a0+χ1a1++χn1an1 we call matrix-vector multiplication. Clearly

Ax=L(x).
Remark 1.3.1.4. Notation.

In these notes, as a rule,

  • Roman upper case letters are used to denote matrices.

  • Roman lower case letters are used to denote vectors.

  • Greek lower case letters are used to denote scalars.

Corresponding letters from these three sets are used to refer to a matrix, the row or columns of that matrix, and the elements of that matrix. If ACm×n then

A   =    < partition A by columns and rows >(a0a1an1)=(˜aT0˜aT1˜aTm1)   =    < expose the elements of A>(α0,0α0,1α0,n1α1,0α1,1α1,n1αm1,0αm1,1αm1,n1 )

We now notice that the standard basis vector ejCm equals the column of the m×m identity matrix indexed with j:

I=(100010001)=(e0e1em1)=(˜eT0˜eT1˜eTm1).
Remark 1.3.1.5.

The important thing to note is that a matrix is a convenient representation of a linear transformation and matrix-vector multiplication is an alternative way for evaluating that linear transformation.

fit width

Let's investigate matrix-matrix multiplication and its relationship to linear transformations. Consider two linear transformations

LA:CkCmrepresented by  matrix ALB:CnCkrepresented by  matrix B

and define

LC(x)=LA(LB(x)),

as the composition of LA and LB. Then it can be easily shown that LC is also a linear transformation. Let m×n matrix C represent LC. How are A, B, and C related? If we let cj equal the column of C indexed with j, then because of the link between matrices, linear transformations, and standard basis vectors

cj=LC(ej)=LA(LB(ej))=LA(bj)=Abj,

where bj equals the column of B indexed with j. Now, we say that C=AB is the product of A and B defined by

(c0c1cn1)=A(b0b1bn1)=(Ab0Ab1Abn1)

and define the matrix-matrix multiplication as the operation that computes

C:=AB,

which you will want to pronounce "C becomes A times B" to distinguish assignment from equality. If you think carefully how individual elements of C are computed, you will realize that they equal the usual "dot product of rows of A with columns of B."

fit width

As already mentioned, throughout this course, it will be important that you can think about matrices in terms of their columns and rows, and matrix-matrix multiplication (and other operations with matrices and vectors) in terms of columns and rows. It is also important to be able to think about matrix-matrix multiplication in three different ways. If we partition each matrix by rows and by columns:

C=(c0cn1)=(˜cT0˜cTm1),A=(a0ak1)=(˜aT0˜aTm1),

and

B=(b0bn1)=(˜bT0˜bTk1),

then C:=AB can be computed in the following ways:

  1. By columns:

    (c0cn1):=A(b0bn1)=(Ab0Abn1).

    In other words, cj:=Abj for all columns of C.

  2. By rows:

    (˜cT0˜cTm1):=(˜aT0˜aTm1)B=(˜aT0B˜aTm1B).

    In other words, ˜cTi=˜aTiB for all rows of C.

  3. One you may not have thought about much before:

    C:=(a0ak1)(˜bT0˜bTk1)=a0˜bT0++ak1˜bTk1,

    which should be thought of as a sequence of rank-1 updates, since each term is an outer product and an outer product has rank of at most one.

These three cases are special cases of the more general observation that, if we can partition C, A, and B by blocks (submatrices),

C=(C0,0C0,N1CM1,0CM1,N1),(A0,0A0,K1AM1,0AM1,K1),

and

(B0,0B0,N1BK1,0BK1,N1),

where the partitionings are "conformal", then

Ci,j=K1p=0Ai,pBp,j.
Remark 1.3.1.6.

If the above review of linear transformations, matrices, matrix-vector multiplication, and matrix-matrix multiplication makes you exclaim "That is all a bit too fast for me!" then it is time for you to take a break and review Weeks 2-5 of our introductory linear algebra course "Linear Algebra: Foundations to Frontiers." Information, including notes [26] (optionally downloadable for free) and a link to the course on edX [27] (which can be audited for free) can be found at http://ulaff.net.