Sections 3.6 of Theory of Computing, A Gentle Introduction, especially Theorem 3.6.3..
Then printout the homework and answer the questions on that paper.

A pumping lemma for CFGs Version 1.0
Last class we developed a pumping lemma for Context free grammars:
Let $G = (\Sigma,NT,R,S)$ be a CFG and let $u$ be a non-empty string in $L(G)$. If $u \geq \text{max}\{ |t| \mid (A,t) \in R \}^{|NT|+1}$ then there are strings $v,w,x,y,z$ such that
  • $u = vwxyz$
  • at least one of w, y is not λ
  • $vx^kxy^kz \in L(G)$ for all $k \geq 0$.

You should go back and read that section of the notes to see why this is true!

Proving a language is not generated by any context free grammar
Theorem The language L = {anbncn | n ≥ 0} is not a context free language.
Proof Any context free language is defined by a grammar. We will prove that no grammar defines L by giving an algorithm that, for any grammar, produces a "counter-example string":
Input: Grammar G = (Σ,NT,R,S)
Output: String u' such that u' ∈ L but u' ∉ L(G) or u' ∉ L but u' ∈ L(G)
  1. set m = max{ |t| | (A,t) ∈ R}|NT| + 1
  2. set n = ⌈m/3⌉
  3. set u = anbncn
  4. if u ∉ L(G) then let u' = u, otherwise
    1. get v,w,x,y,z from the pumping lemma for for CFGs
    2. set u' = vw2xy2z
  • note that m from Step 1 is the bound required by the pumping lemma.
  • note that |u| ≥ m
  • clearly if u ∉ L(G) then u' = u is a counter-example string.
  • u' from Step 4 b is not in L since if wy does not contain one of a,b,c then u' will have less of that character than at least one of the others, while if all three characters are in wy then
    • one of w or z contains a's and b's so that $vw^2xy^2z$ contains b's followed by a's, or
    • one of w or z contains b's and c's so that $vw^2xy^2z$ contains c's followed by b's.


Apply this algorithm to the grammar G:

G: S → aS | bSc | λ
to produce a string that proves that G does not generate the language anbncn.

Intersection of context-free languages
Consider the following two context-free languages and the grammars that generates them:
L1 = {ambmcn | m,n ≥ 0} L2 = {anbmcm | m,n ≥ 0}
S → XC
X → aXb | λ
C → Cc | λ
S → AY
Y → bYc | λ
A → Aa | λ
Clearly both L1 and L2 are CFLs, since we have grammars for them. However, L1 ∩ L2 = {ambmcm | m ≥ 0}, which is the one language we know is not context free. Therefore, in contrast with regular languages, the intersection of two context free languages can fail to be context free.
Theorem: The intersection of two context free languages is not necessarily context free.
Another Mystery Algorithm
Consider the following mystery algorithm:
Input: PDA M = (Q,Σ,Γ,s,Δ,W) and NDFA M' = (Q',Σ,Δ',s',W'), where M' has no λ-transitions
Output: PDA M* = (QxQ',Σ,Γ,(s,s'),Δ*,WxW'), where
Δ* = {((p,p'),x,y,(q,q'),z) | (p,x,y,q,z) ∈ Δ and (p',x,q') ∈ Δ'} ∪ {((p,p'),λ,y,(q,p'),z) | (p,λ,y,q,z) ∈ Δ} ∪ {((p,p'),λ,y,(p,q'),z) | (p',λ,q') ∈ Δ'}
Note that we already have algorithms to transform NDFA's with λ-transitions into equivalent machines without λ-transitions. Therefore, this algorithm can be applied to any PDA/NDFA combination, it just may require a little prerocessing for some NDFA's. We ran this algorithm with the concrete input below:
With the resulting machine to guide us, we determined that the above algorithm takes a PDA M and an NDFA M' and produces a PDA that accepts L(M)∩L(M'). But wait, doesn't this contradict our previous result? No! Because that would require a similar algorithm for two PDAs, not for a PDA and NDFA. So what's stopping us from doing the same thing for two PDAs? Well, that would require two stacks in the resulting machine! So, hopefully we understand a bit better why intersection of CFLs doesn't always produce a CFL, and as a side benefit we have a new theorem:
Theorem: The intersection of a context free language and a regular language is a context free language.
From what we know about intersection from the above, we get the following theorem:
Theorem: The complement of a context free language is not necessarily context free.
Proof: Suppose that the complement of a context free language were always context free. Then consider any two context free languages, L1 and L2. Given our supposition, L1 and L2 are both context free. Since we proved several lectures ago that the union of context free languages is context free, L1L2 is context free. Then given our assumption
would be context free, but this is exactly L1 ∩ L2, and we know that L1 ∩ L2 is not always context free. Therefore, our supposition must be wrong, and the complement of a context free language is not always context free.
Summing up Context Free Languages
Context Free Languages represent pretty much the simplest possible extension of regular languages to something more expressive. They correspond to the languages that are accepted by Push Down Automata, which are pretty much the simplest possible computational model that adds unbounded memory to finite automata. They also correspond to languages generated by Context Free Grammars which are, in a sense, a more expressive version of regular expressions.

On the practical side, context free grammars are used to define a wide range of important languages in computing - both programming languages and data description languages. Almost any programming language you've ever heard of is defined by a context free grammar. Part of the value of doing this is that the grammar can be converted automatically into a program that "parses", i.e. that constructs a parse tree for the input string, or determines that the input string is not derivable in the grammar. This parse tree typically represents the input string in a way that encodes its intended meaning.

Context free languages are limited. Our poster child for a non-context free language is $a^nb^nc^n$. Moreover, context free languages are harder to handle than regular languages. They are not closed under intersection or complement and, though we won't prove it here, even problems like "given two context free grammars, determine whether they define the same language" are hard to solve. In fact, that question is "undecidable", which is a technical term that you'll understand before this course is done. It essentially means that no computer program can solve it.

Christopher W Brown
Last modified: Mon Nov 16 16:34:04 EST 2009