Class 24: Parse Trees & Ambiguous Grammars


Reading
Section 3.2 of Theory of Computing, A Gentle Introduction.
Homework
Then printout the homework and answer the questions on that paper.

A quick grammar short-hand
It becomes tiresome to write grammars like
S → aSa
S → bSs
S → a
S → b
S → λ
in which many rules share the same right-hand side. So we adopt the short-hand convention that we can list many right-hand sides for the same left-hand side all on one line by separating them with the "|" character, like this:
S → aSa | bSs | a | b | λ
This is just a short-hand notation. The |'s are not terminal characters, in fact they're not really in the grammar at all. It's just a short-hand to save us bother and screen/paper space.

An Example & Parse Trees
We wrote a grammar for postfix expressions in the variable v. Here's a solution:
S → SSO | v
O → + | - | * | /
We saw that for a string like v v + v v - * there are lots of different derivations, all corresponding to the many choices of which particular non-terminal to substitute for at any step. Parse Trees allow us to represent the derivation of a string in a grammar without distinguish the order in which non-terminals are substituted. Thus, one and the same parse tree describes many derivations.
A Parse tree's structure
The parse tree for the previous example not only allowed us to examine a derivation without worrying about order, it also showed us the meaning of the postfix expression v v + v v - *, i.e. showed us how that expression was to be evaluated. This can be helpful if you're not familiar with postfix.
Exercise: Draw a parse tree for v v v v * + - using the above grammar.
Now, this parse tree tells you exactly how the expression is to be evaluated! One of the most important aspect of parse trees as far as programming languages are concerned is that their structure describes meaning.

Ambiguity
In C++ we have expressions like cout << "the" << 3. Suppose we want to model such expressions with a grammar. Instead of worrying about concrete values, we'll just worry about some basic object types, like ostream, string, and int.
S → S << S | obj
obj → ostream | string | int
This grammar allows us to construct strings like ostream << string << int.
Exercise: Draw a parse tree for ostream << string << int.
Guess what: There's two valid parse trees for this string! Now that we're relying on parse trees to give meaning to strings, this is a real problem. We can't have two different meanings for a string. Then the meaining is abiguous, we won't know what to do! If a string has more than one parse tree in a given grammar, that grammar is called ambiguous. C++ uses the parse tree corresponding to (ostream << string) << int, a fact that is dictated by <<'s left-to-right associativity. We can rewrite the grammar in a way that encodes this associativity, and remove the ambiguity.
S → S << obj | obj
obj → ostream | string | int
It'd be nice if we could always do this, but it turns out that there are lagnuages out there that are inherently ambiguous, meaning that any grammar for the language will have strings with more than one parse tree. Inherently ambiguity would be quite a problem in a programming language!

A grammar is called ambiguous if there is a string for which there is more than one parse tree in the grammar.

Example with arithmetic expressions in infix
Most people don't like postfix (I don't know why!) and prefer our usual infix notation for arithmetic. So, let's support these knuckleheads and write a grammar for infix expressions onthe values 0 and 1 using + and *.
S → S + S | S * S | 0 | 1
Now, let's write a parse tree for 1 + 1 * 0. Problem! There are two possible parse trees, and they lead to different answers when we evaluate the expression! Our grammar is ambiguous! What we're really missing is the usual precedence of + and *.
S → S + T | T
T → T * N | N
N → (S) | 0 | 1
The above grammar is an unambiguous grammar for infix arithmetic on 0,1,+,*. It even gives us parenthesization.

Dangling Else
The "dangling else" problem is a classic example of ambiguity. Read about it in the book (p. 78), or read the Wikipedia article.


Christopher W Brown
Last modified: Fri Oct 23 15:42:38 EDT 2009