In fact, this situation of having a machine with more states than it needs crops up all the time, since we have so many algorithms for combining and converting machines, and these algorithms often use more states than are necessary for specific inputs. This is especially true of our algorithm for converting a nondeterministic finite automaton without etransitions to a deterministic finite automaton, which was a key ingrediant in the process of converting a regular expression into a computer program that searches for matches to that expression. So, we would like to have an algorithm that takes a DFA M as input, and returns another DFA M' that accepts L(M) and has as few states as is possible for a DFA accepting L(M). This is the State Minimization problem.
Now, we could have a machine with no accepting states or with all states accepting, either of which is clearly equivalent to a 1state machine. So in the following lets assume that isn't the case. Somewhat more subtlely, we could have a machine with "unreachable" states, i.e. states that no string takes us too from the start state. We'll want to assume that our input machine has no such states. However, to make such an assumption we should be able to delete such states ahead of time ... but how? Most of you are taking or have taken data structures: Hopefully you see that a simple depthfirst or breadthfirst search can identify unreachable states.
The key property that we need to hold in each group is that every state in group Gi should agree on which group to transition to for each character. For example, if q1 and q3 are both in G1, δ(q1,a) δ(q3,a) have to go to states in the same group. They don't have to go to the same states, they just need to go to states in the same group. When this property doesn't hold, we need to split up our group into subgoups in which the property does hold. Doing this splitting over and over until there's no need to split further is the algorithm! Below is an example of performing this procedure on M1:
 a b  G1 q0 G1 G1 q1 G1 G2 ← Split into q2 G1 G2 new group G3 q3 G1 G1 q6 G1 G1  G2 q4 G2 G1 q5 G2 G1   a b

G1 q0 G3 G3 ← Split into
q3 G1 G1 new group G4
q6 G1 G1

G2 q4 G2 G1
q5 G2 G1

G3 q1 G1 G2
q2 G1 G2

 a b  G1 q3 G1 G1 q6 G1 G1  G2 q4 G2 G1 q5 G2 G1  G3 q1 G1 G2 q2 G1 G2  G4 q0 G3 G3 
Once we've found our grouping, we construct a new machine
whose states are groups and whose transitions are given by
the final table in the procedure above:
Now instead of a machine with 7 states we have a machine
with 4 states.
δ_{G}(q,x) = G_{i} such that δ(q,x) ∈ G_{i}.For Σ = {a1,a2,...,am}, define the signiture sig of state Q as:
sig(q) = (δ_{G}(q,a1), δ_{G}(q,a2), ..., δ_{G}(q,am))So δ_{G}(q,x) gives the entries of the table we used in the previous section, and sig(q) gives whole rows of the table. Here's the algorithm for dividing the states into groups of equivalent states:
Algorithm: DFA MinimizationNow to be precise, we should prove that the machine M' really accepts L(M) and that it really is the smallest machine that does so.
Input: DFA M = (Q,Σ,δ,s,W), where W ≠ Q, W ≠ ∅, and there are no unreachable states.
Output: DFA M' such that L(M) = L(M') and no DFA with fewer states than M' accepts L(M)
 set G1 = Q  W and G2 = W
 find a group in which not all states have the same signature, divide it into subgroups in which sig(q) is the same for every state q in the subgroup.
 if any group was subdivided, go back to (2) and repeat
 set M' = ({G1,...,Gn},Σ,δ',s',W') where
 s' = Gk, where s ∈ Gk, and
 W' = {Gi  Gi ⊆ W }, and
 δ'(Gi,x) = Gj, where for any q ∈ Gi we have δ(q,x) ∈ Gj.
string u that Mnew misclassifiesNow, to show that for any machine Mnew, not just for this one example, Mnew does not accept L(M') we need an algorithm that takes Mnew as input and produces the counter example string u that Mnew misclassifies as output.
Algorithm: generateCounterExample Input: DFA Mnew with fewer states than M' Output: string u that Mnew misclassifies 
Algorithm: AugmentedMinimization Input: DFA M Output: DFA M' s.t. L(M') = L(M) and

In other words, to prove that M' is really the smallest machine accepting the same language, we need to have an algorithm that outputs an algorithm. So let's do it!
Claim: L(M') = L(M).
Proof: Let p and q be states in M, and let Gi and Gj be the states in M' such that p ∈ Gi and q ∈ Gj. Clearly, if δ(p,x) = q then δ'(Gi,x) = Gj. In other words, δ'(g(p),x) = g(δ(p,x)). Thus, if the computation of machine M on string w is(p_{i1}, a_{1}a_{2}...a_{n}) ⇒ (p_{i2}, a_{2}a_{3}...a_{n}) ⇒ ... (p_{in}, a_{n}) ⇒ (p_{in+1}, λ)then the computation of machine M' on string w is(g(p_{i1}), a_{1}a_{2}...a_{n}) ⇒ (g(p_{i2}), a_{2}a_{3}...a_{n}) ⇒ ... (g(p_{in}), a_{n}) ⇒ (g(p_{in+1}), λ)Since g(p_{in+1}) is an accepting state if and only if p_{in+1} is an accepting state, either w is accepted by both M and M' or it is rejected by both M and M'. Thus, L(M') = L(M).
OK, so that was the easier part. The more difficult part is to prove that there is no smaller machine, i.e. no machine with fewer states than M' that accepts L(M). Essentially what we'll do is construct an algorithm that takes a machine with fewer states than M' and returns a string that the machine misclassifies with respect to M'.
Claim: Let M' = (Q',Σ,δ',s',W') be the machine returned by the above algorithm. No machine with fewer states accepts L(M').
Proof: First we note that since there are no unreachable states in M, there are no unreachable states in M'. That will be important later on. We will give an algorithm that takes a DFA Mnew = (Qnew,Σ,δnew, snew,Wnew) with fewer states than M', and returns a string u for which M' and Mnew make opposite accept/reject decisions, which shows that L(M') ≠ L(Mnew). This will prove that no machine with fewer states than M' accepts the same language. Now, this algorithm has to be customized to each M', so we'll augment the minimization algorithm from above to gather the information specific to M' that will be used by our counterexamplestring algorithm.Lemma: For each pair of distinct states p,q ∈ Q', there is a string u such that (p,u) ⇒* (r,λ) and (q,u) ⇒* (s,λ), where either r ∈ W and s ∉ W, or r ∉ W and s ∈ W.
Proof: Our proof is just to augment the Minmization algorithm so that it constructs the strings we need for us. The following produces, for every pair of group indices i < j, a string wi,j such that either processing wi,j in M starting from any state in Gi is accepting and any state in Gj is nonaccepting, or vice versa. Since the states in M' are precisely the groups from the algorithm, these wi,j define the u's we need. Algorithm: Augmented DFA Minimization
Input: DFA M = (Q,Σ,δ,s,W), where W ≠ Q, W ≠ ∅, and there are no unreachable states.
Output: DFA M' such that L(M) = L(M') and no DFA with fewer states than M' accepts L(M) and a table w, such that for every two states Gi and Gj in M' with i < j, wi,j is a string that takes M' from state Gi to an accepting state and from Gj to a nonaccepting state, or vice versa.
 set G1 = Q  W and G2 = W
Ensure at the begining of this step that for every pair of group indices i < j, we have a string wi,j such that either processing wi,j in M starting from any state in Gi is accepting and any state in Gj is nonaccepting, or vice versa. Initially, there are only groups G1 and G2, and w1,2 = λ.find a group in which not all states have the same signature, divide it into subgroups in which sig(q) is the same for every state q in the subgroup.Let G1,...,Gn be the groups. Suppose Gm is getting split into k new groups. We'll remove Gm and call the new groups Gn+1,...,Gn+k. We'll define the new wi,j's first. Afterwards, we can renumber groups to give them consecutive numbers. There are three cases.
 if i,j ∈ {1,..,m1,m+1,...,n}, then new wi,j = wi,j
 if i ∈ {1,..,m1,m+1,...,n} and j ∈ {n+1,...,n+k}, then new wi,j = wi,m
 i,j ∈ {n+1,...,n+k}, then there must be a character x such that δ_{G}(p,x) is one group for all p ∈ Gi (call it Gr), and another for all p ∈ Gj (call it Gs). We set new wi,j = x wr,s.
 if any group was subdivided, go back to (2) and repeat
 set M' = ({G1,...,Gn},Σ,δ',s',W') where
 s' = Gk, where s ∈ Gk, and
 W' = {Gi  Gi ⊆ W }, and
 δ'(Gi,x) = Gj, where for any q ∈ Gi we have δ(q,x) ∈ Gj.
Now we're ready to give an algorithm that takes a new machine Mnew with fewer states than the minimized DFA M', and returns a string that M' accepts but Mnew rejects, or vice versa.
Algorithm: CounterExampleFinder
Input: Minimized DFA M' = ({G1,...,Gn},Σ,δ',s',W') and table w from the AugmentedDFAMinimizationAlgorithm, and DFA Mnew = (Qnew,Σ,δnew,snew,Wnew) with fewer states than M'
Output: string v such that either v ∈ L(M') but v ∉ L(Mnew) or v ∉ L(M') but v ∈ L(Mnew)
 for each state Gi in M', set ui equal to a string such that (s',ui) ⇒*_{M'} (Gi,λ). ui can be computed with Dijkstra's algorithm from Data Structures, for example.
 find indices i,j where i<j such that $M_{\text{new}}$ ends up in the same state on input $u_i$ as on input $u_j$. I.e. for some $p \in Q_{\text{new}}$ \[ (s_{\text{new}},u_i) \Rightarrow_{M_{\text{new}}}^* (p,\lambda) \text{ and } (s_{\text{new}},u_j) \Rightarrow_{M_{\text{new}}}^* (p,\lambda) \] The pigeonhole principle guarantees such i,j exist.
 set $u_1 = u_i w_{i,j}$
 set $u_2 = u_j w_{i,j}$
 because Mnew arrives at the same state processing $u_i$ as $u_j$, it either accepts both $u_1$ and $u_2$, or rejects them both. However, M' accepts one of $u_1$ and $u_2$ and rejects the other. So Mnew and M' make opposite decisions on either $u_1$ or $u_2$. Set u equal to whichever one of $u_1$ and $u_2 gives opposite decisions in the two machines.
Now we have a proof that for any DFA Mnew with few states that M', L(Mnew) ≠ L(M'), because we have an algorithm that will produce a string u for which Mnew and M' make opposite decisions.
 1 2 3 4 + 1 λ b ab  2 λ λ  3 b  4Consider running CounterExampleFinder Algorithm with Mnew given below:
Mnew  M'  
Step 1: $u_1 = aa, u_2 = bb, u_3 = a, u_4 = \lambda$
Step 2: $i=1,j=4$ since $M_{\text{new}}$ goes to
state $q_0$ on both $u_1 = aa$ and $u_4 = \lambda;$
Step 3: $u_1 = u_i w_{i,j} = u_1 w_{1,4} = aaab$
Step 4: $u_2 = u_j w_{i,j} = u_4 w_{1,4} = ab$
Step 5: $u = u_1 = aaab$, since u ∉ L(M')
but u ∈ L(Mnew)
So the string aaab proves that Mnew does not accept the same language as M'.
Regular languages are theoretically interesting because they are the languages that can be accepted by machines (or programs) with a fixed amount of memory. They are interesting in practice because they are the languages for which we have a simple, efficient pattern matching algorithm through our regular expression to NDFA to DFA to program pipeline, and pattern matching is a problem of tremendous practical importance.
We've also seen, thanks to the Pumping Lemma, that there are many languages that are not regular, meaning that they cannot be accepted by a machine with a fixed amount of memory. Our next step will be to augment our machine model with a simple form of unbounded (i.e. not finite!) auxilliary memory and to see if any of these languages will be acceptable by augmented machines. In the background of our investigations has been this whole methodology of modelling machines with tuples, sets and functions, and of proof by algorithm. When we explore these new augmented machines we'll be using all the same techniques, which'll hopefully be a lot easier to follow the second time around.