(This page contains course notes for Artificial Intelligence taught in the Spring of 2023. If you have come here from a search engine or the like, you may wish to visit the course home page for more material, or visit my home page for possibly more up to date versions.

Linear Algebra

Basic Operations

At it's basics, linear algebra is a generalization of a pattern that crops up over and over again. There are rules for doing stuff that can appear pretty arbitrary, but it was just that in lots of places in physics and mathematics we saw we were doing the same things to groups of numbers, so we decided to formalize and generalize what we saw.
The basic unit of linear algebra is the vector: \[ \textbf{x} = \begin{pmatrix} x_{1}\\ x_2\\ x_3\\ \vdots \end{pmatrix} \]
Let's use a running example I stole from Prof Taylor. You are the manager of a restaurant. You sold 13 hamburgers, 10 chicken sandwiches, and 18 steaks. You could store the information about how many you sell in separate variables: \(h=10,c=7,s=13\) or you could store it as a vector: \[ \textbf{r} = \begin{pmatrix} 10\\ 7\\ 13\\ \end{pmatrix} \]
Now lets say your boss says you need to triple your output. How much do you need to sell? You could multiply each variable by 3: \(h=30,c=21,s=39\) or we could multiply the vector by a scalar: \[ 3\textbf{r} = \begin{pmatrix} 30\\ 21\\ 39\\ \end{pmatrix} \]
This is an example of a common thing we might want to do, that we can represent with a compact notation.
Lets say you open another restaurant. Now you have more food to keep track of \(h_1=10,c_1=7,s_1=13,h_2=8,c_2=11,s_2=8\). Or we could make two vectors: \[ \textbf{r}_1 = \begin{pmatrix} 10\\ 7\\ 13\\ \end{pmatrix}, \textbf{r}_{2} = \begin{pmatrix} 8\\ 11\\ 8\\ \end{pmatrix} \]
And if we want to know the total amount sold, we can add the two vectors: \[ \textbf{r}_1 + \textbf{r}_2 = \begin{pmatrix} 10\\ 7\\ 13\\ \end{pmatrix}+ \begin{pmatrix} 8\\ 11\\ 8\\ \end{pmatrix} = \begin{pmatrix} 18\\ 18\\ 21\\ \end{pmatrix} \]
So, from this perspective, vectors are just a convenient notation for manipulating a bunch of numbers together in a group.
Now let's say you expand some more. Now you have four restaurants. Instead of keeping four vectors, maybe you can combine them into one matrix: \[ \textbf{R}= \begin{pmatrix} 10&7&13\\ 8&11&8\\ 18&13&4\\ 16&3&9\\ \end{pmatrix}\] Notice that I put the information for a single restaurant across the rows, instead of columns. There's a reason that we'll get to later.
Let's say we have prices for each of the food items. We could represent them as individual numbers, or we could make them a vector as well: \[ \textbf{p}=\begin{pmatrix} 13\\ 10\\ 18\\ \end{pmatrix} \] Now, what if we want to find out how much money each restaurant took in. What we want is to take each line in the restaurant matrix and multiply it by the appropriate element in the price vector and sum them up. OK, lets then just define the multiplication of at matrix and a vector to be just that: \[ \textbf{R}\textbf{p}= \begin{pmatrix} 10&7&13\\ 8&11&8\\ 18&13&4\\ 16&3&9\\ \end{pmatrix}\begin{pmatrix} 13\\ 10\\ 18\\ \end{pmatrix}=\begin{pmatrix} 10*13+7*10+13*18\\ 8*13+11*10+8*18\\ 18*13+13*10+4*18\\ 16*13+3*10+9*18\\ \end{pmatrix}=\begin{pmatrix} 434\\ 358\\ 436\\ 400\\ \end{pmatrix} \]
Note that one consequence of this definition is that the number of columns in the matrix must be equal to the number of rows in the vector. Also note that multiplication is not commutative.
We can generalize this definition to multiply matrices. We basically take each row of the left matrix and find the dot product between it and each column of the right matrix. Why might we want to do this? Well, lets add costs to our price vector: \[ \textbf{P}=\begin{pmatrix} 13&10\\ 10&8\\ 18&14\\ \end{pmatrix} \] And take the product: \[\textbf{R}\textbf{P}= \begin{pmatrix} 10&7&13\\ 9&11&8\\ 18&13&4\\ 16&3&9\\ \end{pmatrix}\begin{pmatrix} 13&10\\ 10&8\\ 18&14\\ \end{pmatrix}=\begin{pmatrix} 10*13+7*10+13*18&10*10+7*8+13*14\\ 8*13+11*10+8*18&9*10+11*8+8*14\\ 18*13+13*10+4*18&18*10+13*8+4*14\\ 16*13+3*10+9*18&16*10+3*8+9*14\\ \end{pmatrix}=\begin{pmatrix} 434&338\\ 358&290\\ 436&340\\ 400&310\\ \end{pmatrix} \]
Now we have both costs and gross revenues in the same table.
Notice that just like in the vector case, the number of columns in the left matrix must equal the number of rows in the right matrix. Now, another multiplication can be used to get profit per restaurant: \[\begin{pmatrix} 434&338\\ 358&290\\ 436&340\\ 400&310\\ \end{pmatrix}\begin{pmatrix} 1\\ -1\\ \end{pmatrix}=\begin{pmatrix} 96\\ 68\\ 96\\ 90\\ \end{pmatrix} \]
And one final multiplication to get total profit: \[\begin{pmatrix} 1&1&1& 1\\ \end{pmatrix}\begin{pmatrix} 96\\ 68\\ 96\\ 90\\ \end{pmatrix}= 350 \] Notice how I had to be careful of the order to get what I needed out of that.
This interpretation of linear algebra as just "manipulations that get what we want" can be useful, and is often how we look at it in the context of machine learning, but, one of the neat things about linear algebra is, there are multiple interpretations! We already saw one interpretation in perceptrons - geometric. Vectors are points in space and can be manipulated.

Geometric interpretation

Let's say we have a point in space \(x,y\) and we want to rotate it around the origin \(\phi\) (recall that positive rotation in anti-clockwise). We can do this trigonometrically as follows: \[ \begin{eqnarray*} \sin(\phi+\theta) &=& \frac{y_1}{z},\\ \cos(\phi+\theta) &=& \frac{x_1}{z},\\ \sin(\phi+\theta) &=& \sin\phi\cos\theta+\cos\phi\sin\theta,\\ \cos(\phi+\theta) &=& \cos\phi\cos\theta-\sin\phi\sin\theta,\\ \sin\theta&=&\frac{y_0}{z},\\ \cos\theta&=&\frac{x_0}{z},\\ \frac{y_1}{z} &=& \frac{x_0}{z}\sin\phi+\frac{y_0}{z}\cos\phi,\\ \frac{x_1}{z} &=& \frac{x_0}{z}\cos\phi-\frac{y_0}{z}\sin\phi,\\ y_1 &=& x_0\sin\phi+y_0\cos\phi,\\ x_1 &=& x_0\cos\phi-y_0\sin\phi. \end{eqnarray*} \]
We could approach the same problem by multiplying a vector by a rotation matrix: \[ \begin{pmatrix} \cos(\phi) & -\sin(\phi) \\ \sin(\phi) & \cos(\phi) \\ \end{pmatrix}\begin{pmatrix} x_0\\ y_0\\ \end{pmatrix}=\begin{pmatrix} x_0\cos\phi-y_0\sin\phi\\ x_0\sin\phi+y_0\cos\phi\end{pmatrix} \]
There are many other interpretations of matrices and linear algebra, in physics, graph theory, systems of equations, etc., etc.

Matrix Algebra

Once we have thes things that represent information we want and we can perform operations on, the next question is, can we make an algebra? Of course we can.
Algebras (at this level) just involve figuring out what rules we can apply so we can solve for something so what are our rules?
1. We can add matrices: \(M_1 + M_2\)
2. We can subtract matrices: \(M_1 - M_2\)
3. We can multiply matrices: \(M_1 M_2\)
4. We cannot divide matrices \(\frac{M_1}{M_2}\)
5. Matrix addition and subtraction are commutative: \(M_1 + M_2 = M_2 + M_1\)
6. Matrix mulitplication is not commutative: \(M_1 M_2 \neq M_2 M_1\)
7. Both addition and multiplaction are associative: \((M_1 + M_2) +M_3 = M_1 + (M_2 +M_3)\), \((M_1 M_2) M_3 = M_1 (M_2 M_3)\)
8. The distributive property also allpies: \(M_1(M_2+M_3) = M_1M_2 + M_1M3, (M_1 + M_2)M_3 = M_1M_3+M_2M_3\)
OK, without division, how can we solve for things? Well, the answer requires going back to grade school math, and thinking about inverses and identities.
Recall that for regular multiplication, the multiplicative inverse of \(n\) is \(\frac{1}{n}\), and when you multiply the two, you get the multiplicative identity, 1.
So if we can't find \(\frac{1}{M}\) for a matrix, can we still find a multiplicative inverse? Yes, but first we have to identify the multiplicative identity.
For a square matrix, the multiplicative identity is all zeros, with ones on the diagonals: \[\textbf{I}= \begin{pmatrix} 1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&1 \end{pmatrix}\]
If I multiply any 4x4 matrix by this, I get that matrix \(\textbf{M}\textbf{I}=\textbf{I}\textbf{M}=\textbf{M}\)
So the inverse of a matrix is another matrix such that their product is the identity matrix \(\textbf{M}\textbf{M}^{-1}=I, \textbf{M}^{-1}\textbf{M}=I\)
For small matrices, there are formulas for finding the inverse, but for larger matrices it's much harder.
typically, we use Gaussian Elimination to find the inverse.
Now, what if our matrix is not square? In that case, we make it square
- To do this we first transpose the matrix: \(\textbf{M}^T\)
- To transpose a matrix, we make the columns be the rows. So if: \[\textbf{M}=\begin{pmatrix} a&b\\ c&d\\ e&f\\ \end{pmatrix}\]then, \[\textbf{M}^T=\begin{pmatrix} a&c&e\\ b&d&f\\ \end{pmatrix}\]
- Notice that the product of a matrix with its transpose is a square matrix. If we take the inverse of that now square matrix, then we get what's known as the pseudo-inverse.
So now we can do algebra! For example if we have a matrix and two vectors that are related by the equation \(\textbf{M}\textbf{x} = \textbf{y}\) then: \[ \begin{align} \textbf{M}\textbf{x} &= \textbf{y}\\ \textbf{M}^T\textbf{M}\textbf{x} &= \textbf{M}^T\textbf{y}\\ (\textbf{M}^T\textbf{M})^{-1}\textbf{M}^T\textbf{M}\textbf{x} &= (\textbf{M}^T\textbf{M})^{-1}\textbf{M}^T\textbf{y}\\ \textbf{x} &= (\textbf{M}^T\textbf{M})^{-1}\textbf{M}^T\textbf{y}\\ \end{align}\]

Gauss-Jordan Elimination

The Gauss-Jordan algorithm is the typical method we use to find the inverse of a square matrix. It is a variation of Gaussian Elimination.
We're going to manipulate our matrix using use rules:
1. Exchange two rows in the matrix
2. Multiply a row with a non-zero constant
3. Add or subtract the scalar multiple of one row to another row
The algorithm looks like this:
1. Create the partitioned matrix \((\textbf{M}|\textbf{I})\) , where I is the identity matrix
2. Swap the rows so that the row with the largest, leftmost nonzero entry is on top.
3. Multiply the top row by a scalar so that top row's leading entry becomes 1.
4. Add/subtract multiples of the top row to the other rows so that all other entries in the column containing the top row's leading entry are all zero.
5. Repeat steps 2-4 for the next leftmost nonzero entry until all the leading entries are 1.
Let's see an example
Let's invert: \[ \left(\begin{array}{ccc} 2&0&3\\ -1&3&-4\\ -3&1&-4 \end{array}\right)^{-1} \]
1. Create the partitioned matrix: \[ \left(\begin{array}{ccc|ccc} 2&0&3&1&0&0\\ -1&3&-4&0&1&0\\ -3&1&-4&0&0&1 \end{array}\right) \]
2. The largest is already in the top row, so we skip step two
3. Multiply row 1 by \(\frac{1}{2}\) to get a 1 in the upper left \[ \left(\begin{array}{ccc|ccc} 1&0&\frac{3}{2}&\frac{1}{2}&0&0\\ -1&3&-4&0&1&0\\ -3&1&-4&0&0&1 \end{array}\right) \]
4. Now we want to get zeros in the rest of the first column. Since row 2 has a -1 in it, we can replace row two with row one plus row 2: \[ \left(\begin{array}{ccc|ccc} 1&0&\frac{3}{2}&\frac{1}{2}&0&0\\ 0&3&-\frac{5}{2}&\frac{1}{2}&1&0\\ -3&1&-4&0&0&1 \end{array}\right) \]
5. Next, we try to get a zero in the last row. We replace row three with the sum of row three and 3 times row one. \[ \left(\begin{array}{ccc|ccc} 1&0&\frac{3}{2}&\frac{1}{2}&0&0\\ 0&3&-\frac{5}{2}&\frac{1}{2}&1&0\\ 0&1&\frac{1}{2}&\frac{3}{2}&0&1 \end{array}\right) \]
6. We're done with the first column, we move on to the next. We want a 1 where that 3 is, so we divide the second row by 3 \[ \left(\begin{array}{ccc|ccc} 1&0&\frac{3}{2}&\frac{1}{2}&0&0\\ 0&1&-\frac{5}{6}&\frac{1}{6}&\frac{1}{3}&0\\ 0&1&\frac{1}{2}&\frac{3}{2}&0&1 \end{array}\right) \]
7. In that column, the first row is already 0, so we don't worry about that. Instead we replace row 3 with the difference between row 2 and row 3: \[ \left(\begin{array}{ccc|ccc} 1&0&\frac{3}{2}&\frac{1}{2}&0&0\\ 0&1&-\frac{5}{6}&\frac{1}{6}&\frac{1}{3}&0\\ 0&0&-\frac{4}{3}&-\frac{4}{3}&\frac{1}{3}&-1 \end{array}\right) \]
8. Now for the last column, we multiply row 3 by \(-\frac{3}{4}\) \[ \left(\begin{array}{ccc|ccc} 1&0&\frac{3}{2}&\frac{1}{2}&0&0\\ 0&1&-\frac{5}{6}&\frac{1}{6}&\frac{1}{3}&0\\ 0&0&1&1&-\frac{1}{4}&\frac{3}{4}\\ \end{array}\right) \]
9. replace row 2 with the sum of row 2 and \(\frac{5}{6}\) time3 row 3: \[ \left(\begin{array}{ccc|ccc} 1&0&\frac{3}{2}&\frac{1}{2}&0&0\\ 0&1&0&1&\frac{1}{8}&\frac{5}{8}\\ 0&0&1&1&-\frac{1}{4}&\frac{3}{4}\\ \end{array}\right) \]
10. And our last step - replace row 1 with \(-\frac{3}{2}\) times row 3, plus row 1. \[ \left(\begin{array}{ccc|ccc} 1&0&0&-1&\frac{3}{8}&-\frac{9}{8}\\ 0&1&0&1&\frac{1}{8}&\frac{5}{8}\\ 0&0&1&1&-\frac{1}{4}&\frac{3}{4}\\ \end{array}\right) \]
And we're done. The inverted matrix is: \[ \left(\begin{array}{ccc} 2&0&3\\ -1&3&-4\\ -3&1&-4 \end{array}\right)^{-1}= \begin{pmatrix} -1&\frac{3}{8}&-\frac{9}{8}\\ 1&\frac{1}{8}&\frac{5}{8}\\ 1&-\frac{1}{4}&\frac{3}{4}\\ \end{pmatrix} \]
Finally, for an \(n \times n\) matrix, the algorithm operates by fixing each column (n of them), to fix a column, it operates on each of the n rows, and when it fixes the row, it operates on the entire row. Gauss-Jordan Elimination is \(\in O(n^3)\)