The Matrix of a Linear Transformation

Section 3.2 The Matrix of a Linear Transformation

As we saw in the previous section, linear transformations can be defined using matrices and they can also be defined with no matrices in sight. In this section we will see that, for a certain class of linear transformations, there is always a matrix in sight.

🔗

Subsection 3.2.1 Constructing the Matrix

Our claim might seem fanciful at first. Can every linear transformation be realized using a matrix? The surprising answer is yes, for a specific kind of linear transformation.

🔗

We first make an observation related to the definition of the matrix-vector product in Example 3.1.5.

🔗

Note 3.2.1.

If \(A\) is an \(m\times n\) matrix with columns \(\mathbf{a}_1, \ldots, \mathbf{a}_n\text{,}\) and if we recall the definition of \(\mathbf{e}_j\) from Note 3.1.9, then

\begin{equation*} A(\mathbf{e}_j) = \mathbf{a}_j\text{.} \end{equation*}

The truth of this equality comes by thinking of \(A(\mathbf{e}_j)\) in the way expressed in (3.2), as a linear combination of the columns of \(A\) with weights from the entries in \(\mathbf{e}_j\text{.}\)

🔗

We now suppose that \(\ff\) is a field and that \(T:\ff^n \to \ff^m\) is a linear transformation. We claim that there is a unique \(m\times n\) matrix \(A\) such that for every \(\bfv \in \ff^n\text{,}\) \(T(\bfv) = A\bfv\text{.}\) In other words, we claim that the work of the linear transformation \(T\) can be carried out through multiplication by \(A\text{.}\)

🔗

We will define the matrix \(A\) which does the job. For each \(j = 1, \ldots, n\text{,}\) define the vector \(\mathbf{a}_j\) by \(\mathbf{a}_j = T(\mathbf{e}_j)\text{.}\) We then define \(A\) as the matrix with columns \(\mathbf{a}_1, \ldots, \mathbf{a}_n\text{.}\)

🔗

Since any vector \(\bfv \in \ff^n\text{,}\) written as \(\bfv = [v_i]\text{,}\) has the property that

\begin{equation*} \bfv = \sum_{j=1}^n v_j\mathbf{e}_j\text{,} \end{equation*}

we can verify that the action of \(T\) is the same as the action of multiplication by \(A\text{:}\)

\begin{equation*} T(\bfv) = T \left( \sum_{j=1}^n v_j \mathbf{e}_j \right) = \sum_{j=1}^n v_j T(\mathbf{e}_j) = \sum_{j=1}^n v_j \mathbf{a}_j = A\bfv\text{.} \end{equation*}

Note that we used the fact that \(T\) is a linear transformation in this last string of equalities.

🔗

We have just proved the following theorem.

🔗

Theorem 3.2.2.

If \(T:\ff^n \to \ff^m\) is a linear transformation, then there exists a unique \(m\times n\) matrix \(A\) over \(\ff\) such that \(T(\bfv) = A\bfv\) for all \(\bfv \in \ff^n\text{.}\)

🔗

A scrupulous reader may protest our use of the word “unique” in the statement of this theorem. Here is the argument concerning uniqueness. If the theorem is true, then (for this theorem) there is only one way it could possibly work. If a matrix \(A\) exists, it must have the property that \(A\mathbf{e}_j = T(\mathbf{e}_j)\) for all \(j\text{.}\) Since we have shown that such a construction does work, the matrix \(A\) we obtain must be unique.

🔗

This theorem is quite powerful. We will demonstrate that power through two examples that find their origin in Section 3.1.

🔗

Example 3.2.3.

We take our notation from Example 3.1.4. Let \({T:\rr^2 \to \rr^2}\) be the linear transformation which reflects a vector in the Cartesian plane across the \(x\)-axis, and let \(S:\rr^2\to\rr^2\) be the linear transformation which rotates a vector counter-clockwise around the origin by \(\frac{\pi}{2}\) radians. In this example we will find the \(2\times 2\) matrices \(A\) and \(B\) such that \(T(\bfv)=A\bfv\) and \(S(\bfv)=B\bfv\) for all \(\bfv \in \rr^2\text{.}\)

🔗

In the proof of Theorem 3.2.2, we saw that the way to form the matrix of a linear transformation is to calculate the image of the vectors \(\mathbf{e}_1, \ldots, \mathbf{e}_n\text{.}\) In this context, we need to calculate the image of \(\mathbf{e}_1\) and \(\mathbf{e}_2\) under \(T\) and \(S\text{.}\)

🔗

The calculations we seek are below:

\begin{equation*} T(\mathbf{e}_1) = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \hspace{6pt} T(\mathbf{e}_2) = \begin{bmatrix} 0 \\ -1 \end{bmatrix}, \hspace{6pt} S(\mathbf{e}_1) = \begin{bmatrix} 0 \\ 1 \end{bmatrix}, \hspace{6pt} S(\mathbf{e}_2) = \begin{bmatrix} -1 \\ 0 \end{bmatrix}\text{.} \end{equation*}

This tells us that the matrices \(A\) and \(B\) are as follows:

\begin{equation*} A = \begin{bmatrix} 1 \amp 0 \\ 0 \amp -1 \end{bmatrix}, \hspace{12pt} B = \begin{bmatrix} 0 \amp -1 \\ 1 \amp 0 \end{bmatrix}\text{.} \end{equation*}

Any curious reader can check that these matrices are correct by choosing a vector in \(\rr^2\) and multiplying by \(A\) and by \(B\) separately. The results should align with the actions of \(T\) and \(S\text{,}\) respectively.

🔗

Subsection 3.2.2 Composition and Matrix Multiplication

Since linear transformations are functions, we can compose them with other linear transformations. In order for this to make sense, we need to have the codomains and domains match up correctly. (The reader should recall a brief introduction to this idea in Definition 3.1.10.)

🔗

If \(T:U \to V\) and \(S:V \to W\) are linear transformations between vector spaces, then the linear transformation \(S \circ T: U \to W\) is defined. If \(U = \ff^n\text{,}\) \(V = \ff^m\text{,}\) and \(W = \ff^p\text{,}\) then the linear transformation \(S \circ T\) is defined from \(\ff^n\) to \(\ff^p\text{,}\) and Theorem 3.2.2 says that there is a unique matrix over \(\ff\) which carries out this linear transformation. What is that matrix?

🔗

Theorem 3.2.2 tells us that there are matrices \(A\) and \(B\) such that the transformations \(T\) and \(S\) are multiplication by \(B\) and \(A\text{,}\) respectively. The matrix \(B\) is \(m\times n\) and \(A\) is \(p\times m\text{.}\) We will define the product of \(A\) and \(B\) so that the matrix of \(S\circ T\) is the matrix product \(AB\text{.}\)

🔗

Definition 3.2.4.

Let \(A\) be a \(p\times m\) matrix over a field \(\ff\) and let \(B\) be an \(m\times n\) matrix over \(\ff\text{.}\) Then the matrix product \(AB\) is the unique \(p\times n\) matrix over \(\ff\) such that for all \(\bfu \in \ff^n\text{,}\)

\begin{equation*} A(B\bfu) = (AB)\bfu\text{.} \end{equation*}

🔗

Note 3.2.5.

When we take the matrix product \(AB\text{,}\) the number of columns of \(A\) must match the number of rows of \(B\text{.}\) The matrix product makes no sense (and cannot be computed) otherwise. The matrix \(AB\) then has the same number of rows as \(A\) and the same number of columns as \(B\text{.}\)

🔗

Though we have defined matrix multiplication in terms of the composition of linear transformations, we can multiply matrices of the correct dimensions even when we have no specific linear transformations in mind. This is similar to our understanding of row-reducing a matrix—row reduction arose in the context of solving linear systems, but the process can be carried out on any matrix.

🔗

There is one alternate, useful way to think about matrix multiplication—in terms of the columns of the matrix.

🔗

Proposition 3.2.6.

Let \(A \in M_{m,n}(\ff)\text{,}\) \(B \in M_{n,p}(\ff)\text{,}\) and let the columns of \(B\) be \(\mathbf{b}_1,\ldots,\mathbf{b}_p\text{.}\) Then the columns of \(AB\) are \(A\mathbf{b}_1,\ldots,A\mathbf{b}_p\text{.}\)

🔗

Proof.

By our definition of the matrix product, for each \(j=1, \ldots, p\) we have

\begin{equation*} (AB)\mathbf{e}_j = A(B\mathbf{e}_j)\text{.} \end{equation*}

The observation in Note 3.2.1 means that \(B\mathbf{e}_j=\mathbf{b}_j\text{,}\) so we have

\begin{equation*} (AB)\mathbf{e}_j = A \mathbf{b}_j\text{.} \end{equation*}

Since \((AB)\mathbf{e}_j\) is the \(j\)th column of \(AB\text{,}\) this proves the proposition.

🔗

From the understanding we developed in Example 3.1.5, this proposition means that every column of the matrix product \(AB\) is a linear combination of the columns of \(A\text{.}\)

🔗

We have defined matrix multiplication, but we have not specified how the entries in the matrix product are calculated. Fear not; the wait is over.

🔗

We will use the definition of matrix multiplication and the formula we have for the product of a matrix and a vector (see formula (3.1)). Since \((AB)\bfu\) is a vector, we will record a formula for entry \(i\) in this vector. In what follows, we assume \(A=[a_{ij}]\) and the entries of \(B=[b_{ij}]\text{;}\) we also assume \(\bfu = [u_i]\text{:}\)

\begin{align*} [(AB)\bfu]_i \amp = [A(B\bfu)]_i = \sum_{k=1}^m a_{ik} [B\bfu]_k\\ \amp = \sum_{k=1}^m a_{ik} \sum_{j=1}^n b_{kj}u_j = \sum_{j=1}^n \left(\sum_{k=1}^m a_{ik} b_{kj} \right) u_j \text{.} \end{align*}

🔗

When we look again at the formula in (3.1) for the product of a matrix and a vector, we see that

\begin{equation} [AB]_{ij} = \sum_{k=1}^m a_{ik}b_{kj}\tag{3.5} \end{equation}

for all \(1 \le i \le p\) and all \(1 \le j \le n\text{.}\) In words, this means that the \((i,j)\)-entry of \(AB\) is the entry-wise product of row \(i\) in \(A\) with column \(j\) in \(B\text{.}\) (In Subsection 7.1.1 we will acknowledge this as the dot product of two vectors in \(\ff^m\text{.}\))

🔗

We will try to make this concrete with some examples.

🔗

Example 3.2.7.

Let \(A\) and \(B\) be the following matrices over \(\rr\text{:}\)

\begin{equation*} A = \begin{bmatrix} 2 \amp -1 \\ 3 \amp 4 \end{bmatrix}, \hspace{12pt} B = \begin{bmatrix} -2 \amp 0 \\ 1 \amp -3 \end{bmatrix}\text{.} \end{equation*}

Note that the product \(AB\) makes sense since the number of columns of \(A\) is the same as the number of rows of \(B\text{.}\) Here is the matrix product:

\begin{equation*} AB = \begin{bmatrix} 2(-2)-1(1) \amp 2(0)-1(-3) \\ 3(-2) + 4(1) \amp 3(0)+4(-3) \end{bmatrix} = \begin{bmatrix} -5 \amp 3 \\ -2 \amp -12 \end{bmatrix}\text{.} \end{equation*}

Since the sizes of \(A\) and \(B\) allow it, we can also calculate \(BA\) in this example:

\begin{equation*} BA = \begin{bmatrix} -4 \amp 2 \\ -7 \amp -13 \end{bmatrix}\text{.} \end{equation*}

Finally, we observe that \(AB \neq BA\text{.}\)

🔗

Example 3.2.8.

Let \(A\) and \(B\) be the following matrices over \(\ff_5\text{:}\)

\begin{equation*} A = \begin{bmatrix} 4 \amp 0 \\ 1 \amp 4 \\ 3 \amp 0 \end{bmatrix}, \hspace{12pt} B = \begin{bmatrix} 3 \amp 3 \\ 4 \amp 2 \end{bmatrix}\text{.} \end{equation*}

Since \(A\) is \(3\times 2\) and \(B\) is \(2\times 2\text{,}\) we can calculate \(AB\text{,}\) which will be \(3\times 2\text{.}\) (In this example we cannot calculate \(BA\text{.}\)) Here is the matrix product:

\begin{equation*} AB = \begin{bmatrix} 4(3) + 0(4) \amp 4(3)+0(2) \\ 1(3)+4(4) \amp 1(3)+4(2) \\ 3(3)+0(4) \amp 3(3)+0(2) \end{bmatrix} = \begin{bmatrix} 2 \amp 2 \\ 4 \amp 1 \\ 4 \amp 4 \end{bmatrix}\text{.} \end{equation*}

To obtain the last equality, we remember that we are working in \(\ff_5\text{.}\)

🔗

Since we defined matrix multiplication in the context of the composition of linear transformations, our next example picks up on this theme.

🔗

Example 3.2.9.

We return to Example 3.2.3 and consider the linear transformations \(S,T:\rr^2 \to \rr^2\text{,}\) where \(T\) reflects a vector in the Cartesian plane across the \(x\)-axis and \(S\) rotates a vector counter-clockwise around the origin by \(\frac{\pi}{2}\) radians. In the previous example, we calculated the matrices \(A\) and \(B\) for \(T\) and \(S\text{,}\) respectively. What is the matrix for \(S\circ T\text{?}\)

🔗

We have defined matrix multiplication to answer questions exactly like this. We only need to multiply the matrices in the proper order. The matrix for \(S\circ T\) is

\begin{equation*} BA = \begin{bmatrix} 0 \amp -1 \\ 1 \amp 0 \end{bmatrix} \begin{bmatrix} 1 \amp 0 \\ 0 \amp -1 \end{bmatrix} = \begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \end{bmatrix}\text{.} \end{equation*}

A related question in this context is whether or not linear transformations commute. In other words, is \(S\circ T = T\circ S\text{?}\) For this example, answering that question boils down to comparing the matrix product \(AB\) with the product \(BA\) which we have just calculated:

\begin{equation*} AB = \begin{bmatrix} 1 \amp 0 \\ 0 \amp -1 \end{bmatrix} \begin{bmatrix} 0 \amp -1 \\ 1 \amp 0 \end{bmatrix} = \begin{bmatrix} 0 \amp -1 \\ -1 \amp 0 \end{bmatrix}\text{.} \end{equation*}

From this we can see that \(S\circ T\) and \(T\circ S\) are distinct linear transformations.

🔗

As we start to deal more regularly with matrices in the context of linear transformations, we need to recall the notation \(M_{m,n}(\ff)\) and \(M_n(\ff)\) from Example 2.3.10.

🔗

The next theorem records some facts about matrix multiplication which will be useful later in the text. We will walk the reader through the proof of this theorem in the exercises at the end of this section.

🔗

Theorem 3.2.10.

Let \(A, A_1, A_2 \in M_{m,n}(\ff)\text{,}\) \(B, B_1, B_2 \in M_{n,p}(\ff)\text{,}\) and \({C \in M_{p,q}(\ff)}\text{.}\) Then

\(A(BC) = (AB)C\text{,}\)
🔗

🔗
\(A(B_1 + B_2) = AB_1 + AB_2\text{,}\) and
🔗

🔗
\((A_1+A_2)B = A_1B + A_2B\text{.}\)
🔗

🔗

🔗

This theorem says that, if all of the matrix products make sense, matrix multiplication is associative and obeys both of the distributive laws.

🔗

Subsection 3.2.3 The Transpose of a Matrix

The transpose of a matrix is useful notation for some formulas that will appear later.

🔗

Definition 3.2.11.

If \(A \in M_{m,n}(\ff)\text{,}\) then the transpose of \(A\text{,}\) denoted \(A^T\text{,}\) is the element of \(M_{n,m}(\ff)\) whose rows are the columns of \(A\text{.}\) In other words,

\begin{equation*} [A^T]_{ij} = [A]_{ji} \end{equation*}

for all \(1 \le i \le n\) and all \(1 \le j \le m\text{.}\)

🔗

Note 3.2.12.

The transpose is an easy way to turn a column vector into a row vector and vice versa.

🔗

Example 3.2.13.

If \(A\) is the \(2\times 3\) matrix

\begin{equation*} A = \begin{bmatrix} 2 \amp -1 \amp 0 \\ -2 \amp 4 \amp 5 \end{bmatrix}\text{,} \end{equation*}

then \(A^T\) is the \(3\times 2\) matrix

\begin{equation*} A^T = \begin{bmatrix} 2 \amp -2 \\ -1 \amp 4 \\ 0 \amp 5 \end{bmatrix}\text{.} \end{equation*}

🔗

Some matrices are unaffected by taking the transpose. These deserve a special designation!

🔗

Definition 3.2.14.

A matrix which is equal to its own transpose is called a symmetric matrix. (All symmetric matrices must be square by necessity.)

🔗

The following theorem collects some properties related to the transpose of a matrix.

🔗

Theorem 3.2.15.

Let \(A, C \in M_{m,n}(\ff)\text{,}\) let \(B \in M_{n,p}(\ff)\text{,}\) and let \(k \in \ff\text{.}\) Then the following properties hold:

\((A^T)^T = A\text{;}\)
🔗

🔗
\((A+C)^T = A^T + C^T\text{;}\)
🔗

🔗
\((kA)^T = kA^T\text{;}\) and
🔗

🔗
\((AB)^T = B^TA^T\text{.}\)
🔗

🔗

🔗

Proof.

The first three parts of this theorem are immediate from the definitions and require no proof. To prove the fourth part, we will compare the \((i,j)\)-entry of both \((AB)^T\) and \(B^TA^T\text{.}\) First, from the definition of the transpose and (3.5) we see that

\begin{equation*} [(AB^T)]_{ij} = [AB]_{ji} = \sum_{k=1}^n a_{jk}b_{ki}\text{.} \end{equation*}

To compare, entry \((i,j)\) of \(B^TA^T\) is

\begin{equation*} [B^TA^T]_{ij} = \sum_{k=1}^n [B^T]_{ik}[A^T]_{kj} = \sum_{k=1}^n b_{ki}a_{jk}\text{.} \end{equation*}

Since multiplication is commutative in fields, these two expressions are equal.

🔗

Note 3.2.16.

While it might be more aesthetically pleasing if we did not have to switch the order of the multiplication when taking the transpose of a product, this type of formula makes sense when considering the dimensions of the matrices involved. If \(A\) is \(m\times n\) and \(B\) is \(n\times p\text{,}\) then the expression \(A^TB^T\) wouldn’t even make sense unless \(m=p\text{.}\)

🔗

Reading Questions 3.2.4 Reading Questions

1.

Let \(T:\rr^2 \to \rr^2\) be the linear transformation which is rotation clockwise around the origin by \(\frac{\pi}{2}\) radians. Find the matrix for \(T\text{.}\) (Refer to Example 3.2.3.) Explain your process.

🔗

2.

Consider the following two matrices \(A\) and \(B\) over \(\rr\text{:}\)

\begin{equation*} A = \begin{bmatrix} 0 \amp 3 \\ 5 \amp -1 \\ -1 \amp -3 \end{bmatrix}, \hspace{12pt} B = \begin{bmatrix} -3 \amp -2 \amp 4 \\ 0 \amp 2 \amp -1 \end{bmatrix}\text{.} \end{equation*}

Calculate both \(AB\) and \(BA\text{.}\)

🔗

3.

Write down a \(3\times 3\) matrix over \(\ff_5\) which is symmetric. (See Definition 3.2.14.)

🔗

Exercises 3.2.5 Exercises

1.

Let \(A\text{,}\) \(B\text{,}\) and \(C\) be the following matrices over \(\rr\text{:}\)

\begin{equation*} A = \begin{bmatrix} 2 \amp 0 \amp 1 \\ -1 \amp -2 \amp 2 \end{bmatrix}, \hspace{6pt} B = \begin{bmatrix} 3 \amp -3 \\ 2 \amp 1 \end{bmatrix}, \hspace{6pt} C = \begin{bmatrix} 0 \amp 4 \amp 1 \\ 3 \amp 2 \amp -2 \\ 4 \amp -3 \amp 3 \end{bmatrix}\text{.} \end{equation*}

For each of the following, determine whether the given calculation makes sense. If it does, find the requested matrix. (Do this by hand, without technology.) If it doesn’t make sense, explain why it doesn’t.

\(\displaystyle A^2\)
🔗

🔗
\(\displaystyle AB\)
🔗

🔗
\(\displaystyle AC\)
🔗

🔗
\(\displaystyle BC\)
🔗

🔗
\(\displaystyle BA\)
🔗

🔗
\(\displaystyle B^2\)
🔗

🔗

🔗

2.

Let \(T:\rr^2 \to \rr^2\) be the linear transformation which reflects a vector across the line \(y=x\text{.}\) Find the matrix for \(T\text{.}\)

🔗

3.

Let \(T:\rr^2 \to \rr^2\) be the linear transformation which projects a vector onto the line \(y=x\text{.}\) Find the matrix for \(T\text{.}\)

🔗

Answer.

The matrix for \(T\) is \(\begin{bmatrix} 1/2 \amp 1/2 \\ 1/2 \amp 1/2 \end{bmatrix}\text{.}\)

🔗

4.

Let \(T:\rr^2 \to \rr^2\) be the linear transformation which projects a vector onto the line \(y=-x\text{.}\) Find the matrix for \(T\text{.}\)

🔗

5.

Let \(T:\rr^2 \to \rr^2\) be the linear transformation which rotates a vector counter-clockwise around the origin by an angle of \(\theta\) radians. Find the matrix for \(T\text{.}\) (Each entry in the matrix should be an expression involving \(\theta\text{.}\))

🔗

Answer.

The matrix for \(T\) is \(\begin{bmatrix} \cos\theta \amp -\sin\theta \\ \sin\theta \amp \cos\theta \end{bmatrix}\text{.}\)

🔗

Writing Exercises

6.

In fields, we have the cancellation law for multiplication. If \(ab=ac\) and \(a \neq 0\text{,}\) then \(b=c\text{.}\) Does matrix multiplication have this property?

🔗

Let \(A\text{,}\) \(B\text{,}\) and \(C\) be matrices over \(\ff\) such that \(AB\) and \(AC\) make sense and are the same size and \(A\) is not the zero matrix. If \(AB=AC\text{,}\) must it be true that \(B=C\text{?}\) Either prove this is true or provide a counter-example.

🔗

7.

In fields, multiplication has the no zero divisors property. If \(xy=0\text{,}\) then either \(x=0\) or \(y=0\text{.}\) Does matrix multiplication have this property?

🔗

Let \(A\) and \(B\) be matrices over \(\ff\) such that \(AB\) makes sense. Let \(Z\) be the matrix of the same size as \(AB\) consisting of all zeros. If \(AB = Z\text{,}\) must it be true that either \(A\) or \(B\) is a matrix of all zeros? Either prove this is true or provide a counter-example.

🔗

8.

Let \(A \in M_2(\ff_5)\) be of the form

\begin{equation*} A = \begin{bmatrix} a \amp 0 \\ b \amp c \end{bmatrix}\text{.} \end{equation*}

What conditions must \(a\text{,}\) \(b\text{,}\) and \(c\) satisfy so that \(A^2 = I_2\text{?}\)
🔗

🔗
How many matrices in \(M_2(\ff_5)\) of this form have the property that \(A^2=I_2\text{?}\)
🔗

🔗

🔗

Prev Top Next