Algorithm: makeEPSTry makeEPS on:
Input: Grammar $G$
Output: EPS - the set of non-terminals in $G$ that are nullible.
- EPS := $\{ X\ |\ X \rightarrow \varepsilon \mbox{ is a rule in $G$}\}$
- do {
for each rule $X \rightarrow Y_1 Y_2 \cdots Y_k$ in $G$} while the iteration added something new to EPSif $Y_1,Y_2,\ldots,Y_k$ are all in EPS, then add $X$ to EPS- return EPS
S -> cS | UV | aT
T -> aa|bb
U -> aT | ε
V -> bT | ε
It works! Great ... but does it always work? How could we
prove that it always returns exactly the set of nullifiable
non-terminals in $G$? Here's how: First, note that it is clear
that anything we add to EPS belongs there - i.e. we only add a
non-terminal $X$ to EPS if $X$ is indeed nullible. Obviously
that's true for Step 1, and it's also clear for Step 2, since we
only add $X$ to EPS when we know there is a rule $X \rightarrow
Y_1\cdots Y_k$ for which each $Y_i$ is nullible. So what we
really have to show is that nothing is missing from our final
returned EPS.
We need to show that during the algorithm, if there is a
nullifiable non-terminal that is not currently in EPS, there must be a
grammar rule $X \rightarrow
Y_1\cdots Y_k$ for which each $Y_i$ is already in EPS. That
way we won't terminate the outer loop while anything is missing
from EPS. So let's prove that:
Theorem: if there is a
nullifiable non-terminal that is not currently in EPS, there must be a
grammar rule $X \rightarrow
Y_1\cdots Y_k$ for which each $Y_i$ is already in EPS.
Proof:
Let $T$ be a shortest parse sub-tree such that $\mbox{root}(T)$
is not in EPS, and $T$ yields $\varepsilon$, i.e. all leaves
are $\varepsilon$. (To clarify, there are potentially many
such trees - even infinitely many.
Out of all of them, find one that is shortest - if there are
ties, I don't care which. That's the one
I want!) Note that $T \rightarrow \varepsilon$ cannot be a
rule in $G$, since otherwise $\mbox{root}(T)$ would already be
in EPS. Therefore, if we draw out the parse tree $T$, it
looks like this:

Algorithm: makeFIRSTImportant: It will be useful to define the function $\mbox{first}(w)$ that returns the set of all all terminals that could be the first terminal in a derivation that starts with $w$. I.e. first($w$) contains c if $w \rightarrow^* cv$ for some string $v$. If $w$ is the empty string, this is { }. otherwise, if $w = w_1 w_2 \cdots w_k$ we compute as
Input: Grammar $G$, and EPS, the result of makeEPS($G$)
Output: FIRST - a map that maps a grammar symbol $X$ to the set of terminals that appear as the first character in at least one string derived from $X$. I.e. terminal $c$ is in FIRST($X$) if and only if $X \rightarrow^* cw$ for some string $w$.
- for all grammar symbols x, FIRST(x) := {x} if x is a terminal, and { } if it is a non-terminal
- do {
for each rule $X \rightarrow Y_1 Y_2 \cdots Y_k$ in $G$} while the iteration added something new to FIRSTfor $i$ from $1$ to $k$FIRST($X$) := FIRST($X$) $\cup$ FIRST($Y_i$)
if $Y_i \notin \mbox{EPS}$ break out the inner for loop- return FIRST
P = { }
for $i$ from 1 to $k$P := P $\cup$ FIRST($w_i$)return P
if $w_i \notin \mbox{EPS}$ break
Algorithm: makeFOLLOW
Input: Grammar $G$, EPS - the result of makeEPS($G$), and FIRST - the result of makeFIRST($G$)
Output: FOLLOW - a map that maps a grammar symbol $X$ to the set of terminals that appear as the first character in at least one string derived from $X$. I.e. terminal $c$ is in FIRST($X$) if and only if $X \rightarrow^* cw$ for some string $w$.
- for all non-terminal symbols X, FOLLOW(X) = { }
- do {
for each rule $X \rightarrow Y_1 Y_2 \cdots Y_k$ in $G$} while the iteration added something new to FOLLOWfor $i$ from $1$ to $k-1$if $Y_i$ is a non-terminal then FOLLOW($Y_i$) := FOLLOW($Y_i$) $\cup$ FIRST($Y_{i+1}$)for $i$ from $k$ down to $1$if $Y_i$ is a non-terminal FOLLOW($Y_i$) := FOLLOW($Y_i$) $\cup$ FOLLOW($X$)
if $Y_i \notin \mbox{EPS}$ break out of inner for loop- return FOLLOW
PREDICT($X \rightarrow w$) := first($w$) if $\mbox{eps}(w) = \mbox{false}$, and first($w$) $\cup$ FOLLOW($X$) otherwiseThus, each grammar rule gets its own predict set. As long as every pair of rules with the same left-hand side non-terminal have non-intersecting predict sets, the grammar is LL(1) and we can do top-down parsing.