# Class 22: State Minimization Algorithm

Sections 2.6 and 2.7 of Theory of Computing, A Gentle Introduction.
Homework
Then printout the homework and answer the questions on that paper.

State Minimization: The Problem
Consider the machine M1 below:

If you look at it a bit, you should be able to convince yourself that you could redesign it to use fewer states to accept the same language. Why? Well states q3 and q6, for example, essentially do the same thing. Couldn't they be merged into a single state? This same observation applies in a couple of other places.

In fact, this situation of having a machine with more states than it needs crops up all the time, since we have so many algorithms for combining and converting machines, and these algorithms often use more states than are necessary for specific inputs. This is especially true of our algorithm for converting a nondeterministic finite automaton without e-transitions to a deterministic finite automaton, which was a key ingrediant in the process of converting a regular expression into a computer program that searches for matches to that expression. So, we would like to have an algorithm that takes a DFA M as input, and returns another DFA M' that accepts L(M) and has as few states as is possible for a DFA accepting L(M). This is the State Minimization problem.

Now, we could have a machine with no accepting states or with all states accepting, either of which is clearly equivalent to a 1-state machine. So in the following lets assume that isn't the case. Somewhat more subtlely, we could have a machine with "unreachable" states, i.e. states that no string takes us too from the start state. We'll want to assume that our input machine has no such states. However, to make such an assumption we should be able to delete such states ahead of time ... but how? Hopefully you see that a simple depth or breach first search can identify unreachable states.

An example of the Algorithm for Minimization
The basic idea of the state minimization algorithm is to group together states that are "essentially the same". We take a stab at a grouping, test it, and based on the tests either conclude that we're done or split up our groups further. At each step, we want to make sure that two states are in different groups only if we're absolutely sure they cannot be considered essentially the same. To be conservative, we start off splitting the states into two separate groups: the accepting states and the nonaccepting states. These two groups are definitely different, because ending up in one is very different from ending up into the other.

The key property that we need to hold in each group is that every state in group Gi should agree on which group to transition to for each character. For example, if q1 and q3 are both in G1, δ(q1,a) δ(q3,a) have to go to states in the same group. They don't have to go to the same states, they just need to go to states in the same group. When this property doesn't hold, we need to split up our group into subgoups in which the property does hold. Doing this splitting over and over until there's no need to split further is the algorithm! Below is an example of performing this procedure on M1:

 ``` | a b ------------- G1 q0| G1 G1 q1| G1 G2 ← Split into q2| G1 G2 new group G3 q3| G1 G1 q6| G1 G1 | G2 q4| G2 G1 q5| G2 G1 ``` ``` | a b ------------- G1 q0| G3 G3 ← Split into q3| G1 G1 new group G4 q6| G1 G1 | G2 q4| G2 G1 q5| G2 G1 | G3 q1| G1 G2 q2| G1 G2 ``` ``` | a b ------------- G1 q3| G1 G1 q6| G1 G1 | G2 q4| G2 G1 q5| G2 G1 | G3 q1| G1 G2 q2| G1 G2 | G4 q0| G3 G3 ```

Once we've found our grouping, we construct a new machine whose states are groups and whose transitions are given by the final table in the procedure above:

Now instead of a machine with 7 states we have a machine with 4 states.

A Precise algorithm
Given a machine M = (Q,Σ,δ,s,W) we want to construct a machine M' that has fewer states (if possible) and accepts L(M). To make a nice-looking, precise algorithm out of the above procedure we need a few definitions: Suppose the elements of Q are partitioned into sets G1,...,Gn. Then define δG as:
δG(q,x) = Gi such that δ(q,x) ∈ Gi.
For Σ = {a1,a2,...,am}, define the signiture sig of state Q as:
sig(q) = (δG(q,a1), δG(q,a2), ..., δG(q,am))
So δG(q,x) gives the entries of the table we used in the previous section, and sig(q) gives whole rows of the table. Here's the algorithm for dividing the states into groups of equivalent states:
Algorithm: DFA Minimization
Input: DFA M = (Q,Σ,δ,s,W), where W ≠ Q, W ≠ ∅, and there are no unreachable states.
Output: DFA M' such that L(M) = L(M') and no DFA with fewer states than M' accepts L(M)
1. set G1 = Q - W and G2 = W
2. find a group in which not all states have the same signature, divide it into subgroups in which sig(q) is the same for every state q in the subgroup.
3. if any group was subdivided, go back to (2) and repeat
4. set M' = ({G1,...,Gn},Σ,δ',s',W') where
• s' = Gk, where s ∈ Gk, and
• W' = {Gi | Gi ⊆ W }, and
• δ'(Gi,x) = Gj, where for any q ∈ Gi we have δ(q,x) ∈ Gj.
Now to be precise, we should prove that the machine M' really accepts L(M) and that it really is the smallest machine that does so.

Prelude: Proving the State Minimization Algorithm Correct
Suppose someone designed the machine Mnew

and claimed that it accepted the same language as the minimized machine M' from above. How could you prove they were wrong? I.e. what would convince us that Mnew does not accept L(M')? You need a counter-example string! I.e. you need

 string u that Mnew misclassifies

Now, to show that for any machine Mnew, not just for this one example, Mnew does not accept L(M') we need an algorithm that takes Mnew as input and produces the counter example string u that Mnew misclassifies as output.

 Algorithm: Input: DFA Mnew, Output: → string u that Mnew misclassifies

Finally, to show that for any DFA M, the machine M' produced by our algorithm is has the fewest states of any machine accepting L(M), we need to augment our minimization algorithm so that it not only outputs, M', but also outputs an algorithm that proves that no machine with fewer states than M' accepts the same language, i.e.

 Algorithm: Input: DFA M, Output: DFM M' and → Algorithm: Input: DFA Mnew, Output: → string u that Mnew misclassifies

In other words, to prove that M' is really the smallest machine accepting the same language, we need to have an algorithm that outputs an algorithm. So let's do it!

Proving the State Minimization Algorithm Correct
To prove the state minimization algorithm is correct, we must prove that L(M') = L(M) and we must prove that no DFA with fewer states accepts L(M'). To prove these it will be convenient to concisely describe "the group to which state p from M belongs". We'll define g(p) to do that, i.e. g(p) = Gi such that p ∈ Gi. The first part is easy:
Claim: L(M') = L(M).
Proof: Let p and q be states in M, and let Gi and Gj be the states in M' such that p ∈ Gi and q ∈ Gj. Clearly, if δ(p,x) = q then δ'(Gi,x) = Gj. In other words, δ'(g(p),x) = g(δ(p,x)). Thus, if the computation of machine M on string w is
(pi1, a1a2...an) ⇒ (pi2, a2a3...an) ⇒ ... (pin, an) ⇒ (pin+1, λ)
then the computation of machine M' on string w is
(g(pi1), a1a2...an) ⇒ (g(pi2), a2a3...an) ⇒ ... (g(pin), an) ⇒ (g(pin+1), λ)
Since g(pin+1) is an accepting state if and only if pin+1 is an accepting state, either w is accepted by both M and M' or it is rejected by both M and M'. Thus, L(M') = L(M).

OK, so that was the easier part. The more difficult part is to prove that there is no smaller machine, i.e. no machine with fewer states than M' that accepts L(M). Essentially what we'll do is construct an algorithm that takes a machine with fewer states than M' and returns a string that the machine misclassifies with respect to M'.

Claim: Let M' = (Q',Σ,δ',s',W') be the machine returned by the above algorithm. No machine with fewer states accepts L(M').
Proof: First we note that since there are no unreachable states in M, there are no unreachable states in M'. That will be important later on. We will give an algorithm that takes a DFA Mnew = (Qnew,Σ,δnew, snew,Wnew) with fewer states than M', and returns a string u for which M' and Mnew make opposite accept/reject decisions, which shows that L(M') ≠ L(Mnew). This will prove that no machine with fewer states than M' accepts the same language. Now, this algorithm has to be customized to each M', so we'll augment the minimization algorithm from above to gather the information specific to M' that will be used by our counter-example-string algorithm.

Lemma: For each pair of distinct states p,q ∈ Q', there is a string u such that (p,u) ⇒* (r,λ) and (q,u) ⇒* (s,λ), where either r ∈ W and s ∉ W, or r ∉ W and s ∈ W.
Proof: Our proof is just to augment the Minmization algorithm so that it constructs the strings we need for us. The following produces, for every pair of group indices i < j, a string wi,j such that either processing wi,j in M starting from any state in Gi is accepting and any state in Gj is non-accepting, or vice versa. Since the states in M' are precisely the groups from the algorithm, these wi,j define the u's we need.

Algorithm: Augmented DFA Minimization
Input: DFA M = (Q,Σ,δ,s,W), where W ≠ Q, W ≠ ∅, and there are no unreachable states.
Output: DFA M' such that L(M) = L(M') and no DFA with fewer states than M' accepts L(M) and a table w, such that for every two states Gi and Gj in M' with i < j, wi,j is a string that takes M' from state Gi to an accepting state and from Gj to a non-accepting state, or vice versa.
1. set G1 = Q - W and G2 = W
2. Ensure at the begining of this step that for every pair of group indices i < j, we have a string wi,j such that either processing wi,j in M starting from any state in Gi is accepting and any state in Gj is non-accepting, or vice versa. Initially, there are only groups G1 and G2, and w1,2 = λ.
find a group in which not all states have the same signature, divide it into subgroups in which sig(q) is the same for every state q in the subgroup.
Let G1,...,Gn be the groups. Suppose Gm is getting split into k new groups. We'll remove Gm and call the new groups Gn+1,...,Gn+k. We'll define the new wi,j's first. Afterwards, we can renumber groups to give them consecutive numbers. There are three cases.
• if i,j ∈ {1,..,m-1,m+1,...,n}, then new wi,j = wi,j
• if i ∈ {1,..,m-1,m+1,...,n} and j ∈ {n+1,...,n+k}, then new wi,j = wi,m
• i,j ∈ {n+1,...,n+k}, then there must be a character x such that δG(p,x) is one group for all p ∈ Gi (call it Gr), and another for all p ∈ Gj (call it Gs). We set new wi,j = x wr,s.
3. if any group was subdivided, go back to (2) and repeat
4. set M' = ({G1,...,Gn},Σ,δ',s',W') where
• s' = Gk, where s ∈ Gk, and
• W' = {Gi | Gi ⊆ W }, and
• δ'(Gi,x) = Gj, where for any q ∈ Gi we have δ(q,x) ∈ Gj.

Now we're ready to give an algorithm that takes a new machine Mnew with fewer states than the minimized DFA M', and returns a string that M' accepts but Mnew rejects, or vice versa.

Algorithm: CounterExampleFinder
Input: Minimized DFA M' = ({G1,...,Gn},Σ,δ',s',W') and table w from the Augmented-DFA-Minimization-Algorithm, and DFA Mnew = (Qnew,Σ,δnew,snew,Wnew) with fewer states than M' Output: string v such that either v ∈ L(M') but v ∉ L(Mnew) or v ∉ L(M') but v ∈ L(Mnew)
1. for each state Gi in M', set ui equal to a string such that (s',ui) ⇒*M' (Gi,λ). ui can be computed with Dijkstra's algorithm from Data Structures, for example.
2. for each string ui set pi to the state in Mnew such that (snew,ui) ⇒*Mnew (pi,λ)
3. find indices i,j where i<j such that pi = pj. The pigeonhole principle guarantees such i,j exist.
4. set u1 = ui wi,j
5. set u2 = uj wi,j
6. because Mnew arrives at the same state processing ui as uj, it either accepts both u1 and u2, or rejects them both. However, M' makes opposite decisions for u1 and u2. So Mnew and M' make opposite decisions on either u1 or u2; Set u equal to whichever one of u1 and u2 gives opposite decisions in the two mahcines.

Now we have a proof that for any DFA Mnew with few states that M', L(Mnew) ≠ L(M'), because we have an algorithm that will produce a string u for which Mnew and M' make opposite decisions.

The minimality proof in action
The minimization algorithm is simple. The proof that it it really gives a machine with the minimum number of states is ... not so simple. If you were to run the Augmented DFA Minimization Algorithm on the example from the top of the page, you'd get the minimized machine shown above, but you'd also get the table w:
``` | 1   2   3   4
-+-----------------
1|     λ   b   ab
|
2|         λ   λ
|
3|             b
|
4|
```
Consider running CounterExampleFinder Algorithm with Mnew given by

Step 1: `u1 = aa, u2 = bb, u3 = a, u4 = λ`
Step 2: `p1 = q0, p2 = p2, p3 = q1, q4 = q0`
Step 3: indices i=1,j=4 satisfy pi = pj
Step 4: u1 = ui wi,j = u1 w1,4 = aaab
Step 5: u2 = uj wi,j = u4 w1,4 = ab
Step 5: u = u1 = aaab, since u ∉ L(M') but u ∈ L(Mnew)

So the string aaab proves that Mnew does not accept the same language as M'.

Finite Automata & Regular Expression Debrief
We've looked at three basic kinds of languages so far: languages accepted by deteministic finite automata, languages accepted by nondeterministic finite automata, and languages defined by regular expressions. What we've proved is that all three classes of languages are in fact the same class, and we refer to this class of languages as regular languages.

Regular languages are theoretically interesting because they are the languages that can be accepted by machines (or programs) with a fixed amount of memory. They are interesting in practice because they are the languages for which we have a simple, efficient pattern matching algorithm through our regular expression to NDFA to DFA to program pipeline, and pattern matching is a problem of tremendous practical importance.

We've also seen, thanks to the Pumping Lemma, that there are many languages that are not regular, meaning that they cannot be accepted by a machine with a fixed amount of memory. Our next step will be to augment our machine model with a simple form of unbounded (i.e. not finite!) auxilliary memory and to see if any of these languages will be acceptable by augmented machines. In the background of our investigations has been this whole methodology of modelling machines with tuples, sets and functions, and of proof by algorithm. When we explore these new augmented machines we'll be using all the same techniques, which'll hopefully be a lot easier to follow the second time around.

Christopher W Brown