# Class 29: More on converting Grammars to PDAs

Sections 3.4 of Theory of Computing, A Gentle Introduction.
Homework
Read the proof below carefully, and grok it as well as you are able.

The Grammar-to-PDA conversion
The following algorithm converts grammar G to a PDA M accepting the same language as generated by the grammar.
Algorithm: Grammar2PDA Input: G = (Σ, NT, R, S)
Output: PDA M such that L(M) = L(G)
M = ({q0,q1,q2}, Σ,Σ ∪ NT ∪ {\$}, q0, {(q0,λ,λ,q1,\$), (q1,λ,S\$,q2,λ} ∪ {(q1,λ,wR,q1,A) | (A,w) ∈ R} ∪ {(q1,x,λ,q1,x) | x ∈ Σ}, {q2})
Proof of correctness for the Grammar2PDA algorithm: Intro
To prove that the algorithm Grammar2PDA meets its specification, we must prove that L(M) = L(G), where G is the input grammar, and M is the PDA produced by the algorithm. It is by no means clear that this is true and, even if you really believe it's true, it is certainly not clear how to prove that it is true. Proving that L(M) = L(G) actually requires two sub-proofs:
• prove that if u ∈ L(G) then u ∈ L(M), and
• prove that if u ∈ L(M) then u ∈ L(G).
The fundamental approach to proving both of these is similar. How do we show a string u is in L(G)? We give a parse tree for u! How do we show string u is in L(M)? We give a sequence of configurations comprising an accepting computation for machine M on input u! So to prove u ∈ L(G) ⇒ u ∈ L(M) we give an algorithm that converts a parse tree in G for string u into a configuration-sequence for an accepting computation of M on input u! And to prove u ∈ L(M) ⇒ u ∈ L(G) we give an algorithm that converts configuration-sequence for an accepting computation of M on input u into a parse tree in G for string u!

It's going to be important in what follows to be able to piece together several computations (given as sequences of configurations) into a single computation (also given as sequences of configurations). The following lemma is the key to doing this.

Lemma 1: For any PDA M, if (p,u,λ) ⇒*M (p',λ,r) and (p',v,λ) ⇒*M (q,λ,t) then (p,uv,λ) ⇒*M (q,λ,tr).
Proof: This is obvious enough that I won't prove it. A picky mathematician would use induction. It is worth looking at an example, however.
Suppose we have (p,aab,λ) ⇒*M (p',λ,Xa) and (p',cbc,λ) ⇒*M (q,λ,bccZ). The theorem tells us that (p,aabcbc,λ) ⇒*M (q,λ,bccZXa).

Proof of correctness for the Grammar2PDA algorithm: u ∈ L(G) ⇒ u ∈ L(M)
The following is a lemma that shows that a sub-parse-tree in grammar G that produces string u can be transformed into a sequence of configurations in machine M that reads input string u. If you look closely at the (inductive) proof, you'll see it's essentially a recursive algorithm.

Lemma 2: Let N be the root node of a sub parse-tree that derives some string u, and let A be the symbol from the grammar that N has as a label. (q1,u,λ) ⇒*M (q1,λ,A)
Proof: Procceed by induction on the depth of the parse tree rooted at N. If the depth is zero, then N is a leaf and u = A. If u = A is a terminal then, (q1,u,λ) ⇒M (q1,λ,A). Otherwise, u = A = λ, and (q1,u,λ) ⇒*M (q1,λ,A) is really (q1,λ,λ) ⇒*M (q1,λ,λ) which is trivially true (yields in zero steps).

Suppose the depth is greater than zero. Then node N represents the application of some rule A → w in G, where w ∈ (Σ ∪ {NT})*. If w = λ, then (q1,λ,λ) ⇒M (q1,λ,A). Otherwise, w = w1 w2 ... wk, where the wi are individual symbols, and they are the roots of the individual children of N, N1,N2,...,Nk, respectively. If we use ui to name the string yielded by Ni, then u = u1u2...uk. By induction, for each Ni we have

(q1,ui,λ) ⇒*M (q1,λ,wi).
By repeated application of Lemma 1 to
(q1,u1,λ) ⇒*M (q1,λ,w1), (q1,u2,λ) ⇒*M (q1,λ,w2), ... (q1,uk,λ) ⇒*M (q1,λ,wk)
we get
(q1,u1u2...uk,λ) ⇒*M (q1,λ,wk...w2w1).
Since (q1,λ,wk...w2w1) ⇒M (q1,λ,A), we get that
(q1,u1u2...uk,λ) ⇒*M (q1,λ,A).

Theorem 1: If u ∈ L(G) then w ∈ L(M).
Proof: Let T be a parse tree for u in grammar G. The label of the root is, of course, the start symbol S. From the definition of M we have (q0,λ,λ) ⇒M (q1,λ,\$)) and by Lemma 2 we have (q1,u,λ) ⇒*M (q1,λ,S) , which Lemma 1 allows us to combine into (q0,u,λ) ⇒*M (q1,λ,S\$) . Finally, since the definition of M includes (q1,λ,S\$) ⇒M (q2,λ,λ), we have (q0,u,λ) ⇒*M (q2,λ,λ), which means M accepts u.

Proof of correctness for the Grammar2PDA algorithm: u ∈ L(M) ⇒ u ∈ L(G)
Theorem: If u ∈ L(M) then u ∈ L(G).
Proof: If u ∈ L(M), the structure of the machine ensures that (q1,u,\$) ⇒*M (q1,λ,S\$). What we will do is show how to annotate the configurations in this computation so that the final configuration includes an annotation that is a parse tree in G for the string u.

We annotate a configuration by labeling each character on the stack with a sub-parse-tree according to these two rules:

• if the stack character is a terminal x that is pushed as a result of the transition (q1,x,λ,q1,x), annotate x with the sub-parse tree consisting of a single node labeled x
• if the stack character is a non-terminal A that is pushed as a result of the transition (q1,λ,wR,q1,A) [Note that this means A → w is a rule in G, and also note that if w is non-empty, each character of wR on the stack in the previous configuration is annotated with a sub-parse-tree!] annotate A with
• if w = λ, a node labeled A with one child, which is labeled λ,
• if w = w1 ... wk, where each wi ∈ Σ ∪ NT, the sub-parse tree consisting of a root node labeled with A and children consisting of the first k annotations from the stack in the previous configuration in reverse order.
Our claim is that in the final configuration (q1,λ,S\$), the S is annotated with a parse tree in grammar G for string u. A proper proof of this would rely on induction. However, it seems obvious enough that this procedure works that we won't bother. We simply note that
1. it's obvious the procedure can only produce proper parse trees,
2. it's obvious the non-λ leaf nodes are exactly the characters in u, and
3. each pair of characters in u start off in distinct trees on the stack, and are in reverse order on the stack eactly up until the point at which those trees are combined as children of a common parent, which puts the two characters in proper order with respect to one another, where they stay for the remainder of the computation.

This proof is actually really important, not-only because it's half of showing that M is equivalent to G, but also because it shows that machine M can actually be seen as building a parse tree for the input string. This is, in fact, how tools like bison work.

Christopher W Brown