q0 −a→ q2 −b→ q3 −a→ q2 −b→ q3 −b→ q1 | ||
Machine M | Computation for string w = ababb |
The thing to notice is that we follow a loop in this computation: q0 −a→ q2 −b→ q3 −a→ q2 −b→ q3 −b→ q1. We could cut this loop out (q0 −a→q2 −b→ q3 −b→ q1) and we'd still end up at the very same state q1 which, since it's accepting, will mean that the cutout string abb will also be accepted. We could follow this loop twice we have the computation q0 −a→q2 −b→ q3 −a→ q2 −b→ q3 −a→ q2 −b→ q3 −b→ q1, which ends up at the same accepting state q1. In fact, we can take this loop as many times as we like, and we'll always end up in the accepting state q1.
Theorem: w can be broken up into strings x, y and z (i.e. w = xyz) where y ≠ λ such that xykz ∈ L(M) for all k ≥ 0.
Proof: Let x = a, y = ba , and z = bb. Then w = xyz, and we can add in as many copies of y as we like and M still accepts the string, since we're just going around that loop.
When we break a string up like this and note that we can replace the one y with as many copies (including zero) of y as we want, we call that pumping
String | Computation (with loop) | Decomposition into xyz | Pumping Statement xykz ∈ L(M) for all natural numbers k |
baaaa | q0 −b→ q0 −a→ q2 −a→ q1 −a→ q0 −a→ q2 | x = b, y = aaa, z = a | b(aaa)ka ∈ L(M) for all k |
babb | q0 −b→ q0 −a→ q1 −b→ q3 −b→ q1 | x = e, y = b, z = abb | bkabb ∈ L(M) for all k |
ababb | q0 −a→ q2 −b→ q3 −a→ q2 −b→ q3 −b→ q1 | x = ab, y = ab, z = b | ab(ab)kb ∈ L(M) for all k |
So there are lots of strings that we can pump, but there are also strings that can't be pumped. Convince yourself that aa and abb can't be pumped. Why not? Because there are no loops in the computation! Without a loop in the computation there's no pumping. So the question is, is there an easy criterion X we can use to say "any w accepted by M that matches criterion X can be pumped"? In other words, is there a simple criterion X that guarantees us a loop in the computation? Hopefully you see it: if |w| ≥ 4 we're guaranteed a loop. Why, becuase with 4 characters to read, the computation will visit 5 states, which means one state must be repeated since M only has 4 states! A repeated state in the computation gives us our loop!
Theorem: Any string w accepted by M whose length is greater than or equal to 4 can be broken up into strings x, y and z (i.e. w = xyz) where y ≠ λ such that xykz ∈ L(M) for all natural numbers k.
Proof: As above: If |w| ≥ 4 then there must be a state we visit twice during the computation. This gives us a loop, and when we have a loop we can pump.
Example using pumping:
Consider the machine M2 below:
Suppose I claimed that this machine accepts the language L of all strings of a's whose length is prime. It looks promising, doesn't it? The machine accepts strings of a's whose length is 2,3,5,7,... which certainly looks right. Let's take w = aaaaa. This string is accepted by M2 and has length greater than or equal to the number of states in M2. So, we should be able to break up w into xyz such that y takes us around a loop of states in the computation: q3 −a→ q4 −a→ q5 −a→ q1 −a→ q2 −a→ q1 Sure enough x = aaa, y = aa and z = λ works. So, we know that aaa(aa)k will be accepted by M2 for any value of k. How about k = 3? this gives us a string of 9 a's, and 9 isn't prime. My claim about M2 was wrong!
The preceeding example gives a flavor of what pumping does for us. You pump to find a string that the machine accepts but which is not in the language we want, thus showing that the machine does not actually accept the language we want. There is one technicality though: We want to make sure that y ≠ λ, since if it did then pumping wouldn't actually do anything. If we assume M is deterministic then repeated states are separated by non-λ transitions so the substring taking us through our loop can't be empty. In fact we can guarantee y ≠ λ for nondeterministic machines as well, but we won't bother here.
This leads us to a major result:The Pumping Lemma (Machine Version 1.0)The preceeding items are essentially a proof of this statement.
For any deterministic machine M, for all w ∈ L(M) of length greater than or equal to the number of states in M, there exist x,y,z ∈Σ*, where w = xyz and y ≠ λ, such that xykz ∈ L(M) for all k ≥ 0.
Theorem: There is no deterministic finite automaton accepting L = {anbn| n ≥ 0}.
Proof: Consider the following algorithm:Algorithm AWe claim that w' is misclassified by M with respect to L. Why? First note that w ∈ L, and |w| ≥ |Q|. If w ∉ L(M) then M clearly misclassifies w and we set w' = w. Otherwise, we satisfy the first two conditions in the Pumping Lemma v1.0, so we get our x, y and z and set w' = xy2z. By the Pumping Lemma, M accepts w'. However, w' ∉ L. Why?
Input: DFA M = (Q,Σ,δ,s,W), where Σ = {a,b}
Output: string w' ∈ Σ* that M misclassifies with respect to L = {anbn| n ≥ 0}, i.e. either w' ∈ L but w' ∉ L(M) or w' ∉ L but w' ∈ L(M)
- let w = a|Q|b|Q|
- if w is not accepted by M then
else
- set w' = w (w' should be accepted, but isn't!)
- let x,y,z be the strings produced by the Pumping Lemma for machine M and string w.
- set w' = xy2z (w' shouldn't be accepted, but is!)
Whichever is the case, M misclassifies w', and our thereom is proved.
- if y is entirely within the a's block of w, then xy2z will have more a's than b's and thus won't be in L.
- if y is entirely within the b's block of w, then xy2z will have more b's than a's and thus won't be in L.
- if y straddles the a's block and the b's block, then xy2z will have some a's coming after b's and thus won't be in L.
The real trick to these problems is that you can't make any assumptions about the x,y,z produced by the Pumping Lemma other than that they satsify the requirements of the Pumping Lemma v1.0. There may be many was to break the string up into those three componants, and your algorithm needs to produce a w' that M misclassifies regardless of which of the many possible x,y,z you get.
M = |
Since |Q| = 3, we start off with w = a3b3. Since M accepts w we "call" the pumping lemma with M and w and it gives us back a decomposition. It has several to choose from, so let's suppose it returns x = a, y = aa, z = bbb. So a(aa)kbbb will be accepted by M, regardless of what k is. The algorithm then returns w' = a(aa)2bbb = aaaaabbb, which is accepted by M, but not in the language L = {anbn| n ≥ 0}