# Class 19: Intro to the Pumping Lemma

These notes.
Homework
Then printout the homework and answer the questions on that paper.

Intoduction to Pumping (... there exist x,y,z ∈ Σ*, where w = xyz and y ≠ λ, such that xykz ∈ L(M) for all k)
Consider the Machine M defined over Σ = {a,b} given below, and let's consider what happens when we run it on string w = ababb, which is accepted by M:

 q0 −a→ q2 −b→ q3 −a→ q2 −b→ q3 −b→ q1 Machine M Computation for string w = ababb

The thing to notice is that we follow a loop in this computation: q0aq2 bq3aq2bq3bq1. We could cut this loop out (q0aq2  bq3bq1) and we'd still end up at the very same state q1 which, since it's accepting, will mean that the cutout string abb will also be accepted. We could follow this loop twice we have the computation q0aq2 bq3aq2 bq3aq2 bq3bq1, which ends up at the same accepting state q1. In fact, we can take this loop as many times as we like, and we'll always end up in the accepting state q1.

Theorem: w can be broken up into strings x, y and z (i.e. w = xyz) where y ≠ λ such that xykz ∈ L(M) for all k ≥ 0.
Proof: Let x = a, y = ba , and z = bb. Then w = xyz, and we can add in as many copies of y as we like and M still accepts the string, since we're just going around that loop.

When we break a string up like this and note that we can replace the one y with as many copies (including zero) of y as we want, we call that pumping

What strings are pump-able? (for all w ∈ L(M) of length greater than or equal to the number of states in M ...)
The string ababb is not the only string w you can pump with using machine M. There are lots of them! Consider the following:

 String Computation (with loop) Decomposition into xyz Pumping Statement xykz ∈ L(M) for all natural numbers k baaaa q0 −b→ q0 −a→ q2 −a→ q1 −a→ q0 −a→ q2 x = b, y = aaa, z = a b(aaa)ka ∈ L(M) for all k babb q0 −b→ q0 −a→ q1 −b→ q3 −b→ q1 x = e, y = b, z = abb bkabb ∈ L(M) for all k ababb q0 −a→ q2 −b→ q3 −a→ q2 −b→ q3 −b→ q1 x = ab, y = ab, z = b ab(ab)kb ∈ L(M) for all k

So there are lots of strings that we can pump, but there are also strings that can't be pumped. Convince yourself that aa and abb can't be pumped. Why not? Because there are no loops in the computation! Without a loop in the computation there's no pumping. So the question is, is there an easy criterion X we can use to say "any w accepted by M that matches criterion X can be pumped"? In other words, is there a simple criterion X that guarantees us a loop in the computation? Hopefully you see it: if |w| ≥ 4 we're guaranteed a loop. Why, becuase with 4 characters to read, the computation will visit 5 states, which means one state must be repeated since M only has 4 states! A repeated state in the computation gives us our loop!

Theorem: Any string w accepted by M whose length is greater than or equal to 4 can be broken up into strings x, y and z (i.e. w = xyz) where y ≠ λ such that xykz ∈ L(M) for all natural numbers k.
Proof: As above: If |w| ≥ 4 then there must be a state we visit twice during the computation. This gives us a loop, and when we have a loop we can pump.
What machines are pump-able (for any machine M ...)
Look at what we did for our machine M above. We said that any string w whose length is at least the number of states is pumpable. But isn't this true for any machine? If w's length is at least the number of states then we'll have a loop in the computation, with a loop we can pump and still end up in the same state as we ended up in for w. (Note: This is an application of the "pigeon hole principle". There are |w| + 1 visited states and fewer than |w| + 1 distinct states, therefore at least two visited states must be the same.)
Example using pumping:
Consider the machine M2 below:

Suppose I claimed that this machine accepts the language L of all strings of a's whose length is prime. It looks promising, doesn't it? The machine accepts strings of a's whose length is 2,3,5,7,... which certainly looks right. Let's take w = aaaaa. This string is accepted by M2 and has length greater than or equal to the number of states in M2. So, we should be able to break up w into xyz such that y takes us around a loop of states in the computation: q3aq4aq5aq1 aq2aq1 Sure enough x = aaa, y = aa and z = λ works. So, we know that aaa(aa)k will be accepted by M2 for any value of k. How about k = 3? this gives us a string of 9 a's, and 9 isn't prime. My claim about M2 was wrong!

The preceeding example gives a flavor of what pumping does for us. You pump to find a string that the machine accepts but which is not in the language we want, thus showing that the machine does not actually accept the language we want. There is one technicality though: We want to make sure that y ≠ λ, since if it did then pumping wouldn't actually do anything. If we assume M is deterministic then repeated states are separated by non-λ transitions so the substring taking us through our loop can't be empty. In fact we can guarantee y ≠ λ for nondeterministic machines as well, but we won't bother here.

This leads us to a major result:
The Pumping Lemma (Machine Version 1.0)
For any deterministic machine M, for all w ∈ L(M) of length greater than or equal to the number of states in M, there exist x,y,z ∈Σ*, where w = xyz and y ≠ λ, such that xykz ∈ L(M) for all k ≥ 0.
The preceeding items are essentially a proof of this statement.

Proving a language isn't regular
Let L = { akbk | k ≥ 0 }. What would it take for me to convince you that there is no FA accepting L? Well, if I gave you a concrete machine M1 and said "M1 accepts L", how would you prove me wrong? You'd find a string that M1 misclassifies, i.e. that it accepts even though its not in L, or rejects even though it is in L. Once found, that string is the proof that my machine doesn't accept L. Now, if I came back with a new machine M2 and said "this time I've got it, M2 accepts L!", what would you do? You'd find a new string, one that M2 misclassifies. And so for every new machine you'd have to find a string that it misclassifies, and that string would be your proof that the machine didn't accept L. Still, how would we be sure that there wasn't another machine lurking out there somewhere that works? What we need is to automate this process of finding strings. We need an algorithm that takes a machine as input and outputs a string that the input machine misclassifies. Such an algorithm would then prove that no machine accepts L. It turns out that the pumping lemma is the key to creating such an algorithm.

Theorem: There is no deterministic finite automaton accepting L = {anbn| n ≥ 0}.
Proof: Consider the following algorithm:
Algorithm A
Input: DFA M = (Q,Σ,δ,s,W), where Σ = {a,b}
Output: string w' ∈ Σ* that M misclassifies with respect to L = {anbn| n ≥ 0}, i.e. either w' ∈ L but w' ∉ L(M) or w' ∉ L but w' ∈ L(M)
1. let w = a|Q|b|Q|
2. if w is not accepted by M then
• set w' = w (w' should be accepted, but isn't!)
else
• let x,y,z be the strings produced by the Pumping Lemma for machine M and string w.
• set w' = xy2z (w' shouldn't be accepted, but is!)
We claim that w' is misclassified by M with respect to L. Why? First note that w ∈ L, and |w| ≥ |Q|. If w ∉ L(M) then M clearly misclassifies w and we set w' = w. Otherwise, we satisfy the first two conditions in the Pumping Lemma v1.0, so we get our x, y and z and set w' = xy2z. By the Pumping Lemma, M accepts w'. However, w' ∉ L. Why?
• if y is entirely within the a's block of w, then xy2z will have more a's than b's and thus won't be in L.
• if y is entirely within the b's block of w, then xy2z will have more b's than a's and thus won't be in L.
• if y straddles the a's block and the b's block, then xy2z will have some a's coming after b's and thus won't be in L.
Whichever is the case, M misclassifies w', and our thereom is proved.

The real trick to these problems is that you can't make any assumptions about the x,y,z produced by the Pumping Lemma other than that they satsify the requirements of the Pumping Lemma v1.0. There may be many was to break the string up into those three componants, and your algorithm needs to produce a w' that M misclassifies regardless of which of the many possible x,y,z you get.

Example Problem
Given the machine
 M =
what is the string produced by the above proof/algorithm for M?

Since |Q| = 3, we start off with w = a3b3. Since M accepts w we "call" the pumping lemma with M and w and it gives us back a decomposition. It has several to choose from, so let's suppose it returns x = a, y = aa, z = bbb. So a(aa)kbbb will be accepted by M, regardless of what k is. The algorithm then returns w' = a(aa)2bbb = aaaaabbb, which is accepted by M, but not in the language L = {anbn| n ≥ 0}

Christopher W Brown