Let's consider our good friend the grammar
S -> X T T -> @ X T T -> epsilon X -> a X -> b... where "epsilon" means the empty string. There are three non-terminals (S,T,X) and three terminals (a,b,@). To this we will add the token $, which will denote the end of input. That means there is, at least conceptually, a new rule
R -> S $
where R is the "real" start symbol. We won't really use R and
this new rule in derivations, but it allows us to say concretely
what we mean by the end-of-input terminal $. In any event, this
means that there are only 3x4 = 12 possible ways to call
chooseRule. So we might as well write them down in a table and
turn chooseRule into a simple table lookup. Here's the table
for this grammar:
$
\begin{array}{r|cccc}
& a & b & @ & \text{\$}\\
\hline
S & S \rightarrow X T & S \rightarrow X T & err & err\\
T & err & err & T \rightarrow @ X T & T\rightarrow \epsilon\\
X & X \rightarrow a & X \rightarrow b & err & err
\end{array}
$
A grammar is LL(1) precisely when the predict sets for any two rules with the same left-hand-side are disjoint. After all, if we had two rules $A \rightarrow w_1$ and $A \rightarrow w_3$, and some terminal $x$ was in both of their predict sets, it would mean that expanding with either rule could be valid depending on what comes after the lookahead $x$. So you have to use more than that one symbol of lookahead to know which rule to use.
Here is a grammar for a simple language for arithmetic, in which + and * are functions not operators. The tokens are NUM, ID, LP and RP. Examples of strings are: +(34 12) and *(10 +(7 2) -1). The predict sets for the grammar rules, computed with the tool from course resources, are also shown.
| Grammar | Predict sets |
exp -> ID LP etail exp -> NUM etail -> exp etail etail -> RP |
PREDICT: exp -> ID LP etail : ID exp -> NUM : NUM etail -> exp etail : ID NUM etail -> RP : RP |
We will assume that we have a function peek() that returns the next lookahead token, a function match(tok) that consumes the next token and throws an error if it doesn't match tok, and a function paerror(str) that prints an error based on str, and exits. The idea is that each non-terminal will be turned into a function that matches the next portion of the token stream to something that is generated by that non-terminal. If we name these functions after the non-terminals, the connection should be clear: A full working program consists of: rdex.h, rdex.lpp, rdex.cpp, Makefile,
To understand the relation between recrusive descent parsing
and what we have previously described as top-down parsing
(predictive parsing), it helps to look at an example string.
Consider parsing +(10 32). The two images below
depict the situation at the point at which we are just about
to match NUM[10] for both a predictive parser and a recursive
descent parsers. Moreover, it shows the relationship between
the stack and the parse tree for the string.