Let's consider our good friend the grammar
S -> X T T -> @ X T T -> epsilon X -> a X -> b... where "epsilon" means the empty string. There are three non-terminals (S,T,X) and three terminals (a,b,@). To this we will add the token $, which will denote the end of input. That means there is, at least conceptually, a new rule
R -> S $
where R is the "real" start symbol. We won't really use R and
this new rule in derivations, but it allows us to say concretely
what we mean by the end-of-input terminal $. In any event, this
means that there are only 3x4 = 12 possible ways to call
chooseRule. So we might as well write them down in a table and
turn chooseRule into a simple table lookup. Here's the table
for this grammar:
$
\begin{array}{r|cccc}
& a & b & @ & \text{\$}\\
\hline
S & S \rightarrow X T & S \rightarrow X T & err & err\\
T & err & err & T \rightarrow @ X T & T\rightarrow \epsilon\\
X & X \rightarrow a & X \rightarrow b & err & err
\end{array}
$
Input:
Output:
A grammar is LL(1) precisely when the predict sets for any two rules with the same left-hand-side are disjoint. After all, if we had two rules $A \rightarrow w_1$ and $A \rightarrow w_3$, and some terminal $x$ was in both of their predict sets, it would mean that expanding with either rule could be valid depending on what comes after the lookahead $x$. So you have to use more than that one symbol of lookahead to know which rule to use.
Here is a grammar for a simple language for arithmetic, in which + and * are functions not operators. The tokens are NUM, ID, LP and RP. Examples of strings are: +(34 12) and *(10 +(7 2) -1). The predict sets for the grammar rules, computed with the tool from course resources, are also shown.
| Grammar | Predict sets |
exp -> ID LP etail exp -> NUM etail -> exp etail etail -> RP |
PREDICT: exp -> ID LP etail : ID exp -> NUM : NUM etail -> exp etail : ID NUM etail -> RP : RP |
We will assume that we have a function peek() that returns the next lookahead token, a function match(tok) that consumes the next token and throws an error if it doesn't match tok, and a function paerror(str) that prints an error based on str, and exits. The idea is that each non-terminal will be turned into a function that matches the next portion of the token stream to something that is generated by that non-terminal. If we name these functions after the non-terminals, the connection should be clear: A full working program consists of: rdex.h, rdex.lpp, rdex.cpp, Makefile,
To understand the relation between recrusive descent parsing
and what we have previously described as top-down parsing
(predictive parsing), it helps to look at an example string.
Consider parsing +(10 32). The two images below
depict the situation at the point at which we are just about
to match NUM[10] for both a predictive parser and a recursive
descent parsers. Moreover, it shows the relationship between
the stack and the parse tree for the string.
+(3 *(9 3) 5) we should not only be told that the
string is valid, we should also be told that the result is 14.
We do this by augmenting the functions in the recursive descent
parser to return values. The tricky thing is that, because we
had to break up expressions into exp + etail, we have
information to pass along into e-tail. For example:
____ etail() reads this
/ \
+ ( 12 9 )
\__/
exp() reads this
So etail() only sees the numbers, it doesn't know whether it
should be adding or multiplying. Thus, we need to pass the
operator (+ or *) to etail as a parameter. Similarly, as etail
calls itself recursively, it doesn't know what numbers have
already been added or multiplied already. So we need to pass it
the value accumulated already (much like the "accumulator" we
used so often for tail-recursion). To handle this, the match()
function will return a value too --- the numerical value for NUM
tokens, the ASCII value of + or * for OP, and anything you like
for LP and RP.