Class 29: A Pumping Lemma for CFGs


Reading
Sections 3.6 of Theory of Computing, A Gentle Introduction. Notice that Lemma 3.6.2 is a little stronger than the Pumping Lemma we gave in class. We might call it "Version 2.0".
Homework
Due Wednesday, 18 November.
Consider the grammar G given by
S → ABA
A → aa | bb | aBb
B → cAc | cc
and the string w = bbcaccbcaccb
  1. Construct the parse tree for w using grammar G.
  2. There is a path containing repeated non-terminals. Identify the repeated non-terminals.
  3. Give me the u,v,x,y,z such that w = uvxyz, not both v and y are the empty string, and uvkxykz ∈ L(G).

Finishing up parsing
In class we went through an example similar to, but more complicated than, the example at the end of the previous class's notes. We considered the following grammar:
E → E+T | T
T → T*n | n
We then applied our Grammar-to-PDA Conversion algorithm to produce the following PDA:
Finally, we wrote out the sequence of configurations in an accepting computation for the string n+n*n. However, as we did this we kept track of where each symbol came from. What we found was that this process actually built the parse tree for the input string. In fact, this is essentially the proof that our Grammar-to-PDA Conversion algorithm really produces a PDA that accepts exactly the language generated by the input grammar.

A Pumping Lemma for CFGs
The below roughly follows the example we used in class. I apologize for the minor notational differences.

We considered the grammar $G$:

\begin{displaymath}
\begin{array}{l}
S \longrightarrow TQ\\
T \longrightarro...
...
A \longrightarrow a\\
B \longrightarrow b\\
\end{array} \end{displaymath}

with start symbol $S$ and alphabet $\{a,b\}$. Consider the parse tree for the string $babaa$:

\scalebox {0.8}{\includegraphics{ParseTree0.eps}}

Notice the variable $Q$ that appears twice along one path from root to a leaf node. (The two occurrences are circled.) We can cut out and splice in the subtrees rooted at $Q$ to form new trees that are still perfectly valid subtrees.

\scalebox {0.6}{\includegraphics{ParseTree1.eps}}

By constructing these new parse trees, we're really constructing new strings in the language! You'll notice that in our original parse tree, we broke $babaa$ up into $vwxy$, where $v = ba$, $w = b$, $x =
a$, and $y = a$. What the new parse trees show is that not only is $vwxy$ in $L(G)$, but also $vx \in L(G)$ and $vw^2xy^2 \in L(G)$. Of course, we could keep splicing in more and more copies of the larger subtree in place of the smaller, and show that $vw^mxy^m \in L(G)$ for any $m \in \mathbb{N}$. We get a whole infinite family of strings in $L(G)$, specifically $ab(b^m)a(a^m)
\in L(G)$.

A pumping lemma for CFGs Version 1.0
Here's a pumping lemma version 1.0 for Context free grammars:
For any grammar G = (Σ, NT, R, S), for any non-empty string w ∈ L(G) where |w | ≥ max{ |t| | (A,t) ∈ R }|NT| + 1, there exist strings u,v,x,y,z such that w = uvxyz and not both u and y are the empty string and uvkxykz ∈ L(G) for all k ≥ 0.
The book describes all this stuff in detail, but we went through it in class as well. If the string w is "big enough", you'll have a path of length |NT| + 1 in the parse tree. There are |NT| + 1 non-terminals along such a path. There must be at least one repeated non-terminal along this path. We can use the repeated non-terminal to decompose as per the above example, and that allows us to pump!

The only thing we haven't justified is how we know that u and y aren't both the empty string. The short answer is ... maybe they are. On the other hand, if both u and y were empty, we could simply replace the "big" subtree by the "little" subtree (see the figures above) and we'd have a parse tree for the same string w, but with fewer nodes. If you do this every time you find your repeated non-terminal symbol along a path from the root to a leaf that has both u and y empty, the process has to stop eventually. After all, the string w stays the same, but the number of nodes in the parse tree decreases at each step. Since the number of nodes can't go all the way down to zero and still produce a non-empty string! So, eventually you'll find repeated non-terminal symbols that yield the decomposition w = uvxyz where not both u and y are the empty string.


Christopher W Brown
Last modified: Mon Nov 16 09:08:34 EST 2009