Just as with DFAs and NDFAs, it is helpful to define a special notation for derivable in "zero or more steps".
With these notions in hand, defining the language generated by a grammar is natural.
Note: The class of languages that are generated by context free grammars are called the Context Free Languages.
S → aSain which many rules share the same left-hand side. So we adopt the short-hand convention that we can list many right-hand sides for the same left-hand side all on one line by separating them with the "|" character, like this:
S → bSs
S → a
S → b
S → λ
S → aSa | bSs | a | b | λThis is just a short-hand notation. The |'s are not terminal characters, in fact they're not really in the grammar at all. It's just a short-hand to save us bother and screen/paper space.
S → SSO | vWe saw that for a string like v v + v v - * there are lots of different derivations, all corresponding to the many choices of which particular non-terminal to substitute for at any step. Parse Trees allow us to represent the derivation of a string in a grammar without distinguish the order in which non-terminals are substituted. Thus, one and the same parse tree describes many derivations.
O → + | - | * | /
Exercise: Draw a parse tree for
v v v v * + - using the above grammar
and use it to evaluate 5 6 4 9 * + -.
Now, this parse tree tells you exactly how the expression is
to be evaluated! One of the most important aspect of parse
trees as far as programming languages are concerned is that
their structure describes meaning.
Important: in computer science we often rely on the structure of a grammar's parse trees to provide meaning.
cout << "the" << 3. Suppose we want
to model such expressions with a grammar. Instead of worrying
about concrete values, we'll just worry about some basic
object types, like ostream, string,
and int.
S → S << S | objThis grammar allows us to construct strings like
obj → ostream | string | int
ostream << string << int.
Exercise: Draw a parse tree for
ostream << string << int.
Guess what: There's two valid parse trees for this
string! Now that we're relying on parse trees to give meaning
to strings, this is a real problem. We can't have two
different meanings for a string. Then the meaining is
abiguous, we won't know what to do!
If a string has more than one parse tree in a given grammar,
that grammar is called ambiguous. C++ uses the parse
tree corresponding to
(ostream << string) << int,
a fact that is dictated by <<'s left-to-right
associativity. We can rewrite the grammar in a way that
encodes this associativity, and remove the ambiguity.
S → S << obj | objIt'd be nice if we could always do this, but it turns out that there are lagnuages out there that are inherently ambiguous, meaning that any grammar for the language will have strings with more than one parse tree. Inherent ambiguity would be quite a problem in a programming language!
obj → ostream | string | int
Definition: A grammar is called ambiguous if there is a string for which there is more than one parse tree in the grammar.
S → S + S | S * S | 2 | 3Now, let's write a parse tree for 2 + 3 * 3. Problem! There are two possible parse trees, and they lead to different answers when we evaluate the expression! Our grammar is ambiguous! That's a real problem, since this implies that the meaning of the expression is ambiguous!
We could try to write the grammar in a different way so that it generates the same strings, but without the ambiguity.
S → S + T | TThis grammar is unambiguous (not that that's obvious) so there is one and only one parse tree for 2 + 3 * 3 and what's more, it matches our precedence rules! So that's good.
T → T * N | N
N → 2 | 3
However, if you play around with this grammar, you'll find that there's no way to express that you'd like the addition "2 + 3" to happen before the multiplication "*3". In fact, to adequately express arithmetic in infix, you need parentheses. And that is one of the big reasons why postfix really is a superior language for an arithmetic calculator!
S → S + T | T
T → T * N | N
N → (S) | 2 | 3