Algorithm: makeEPSTry makeEPS on:
Input: Grammar $G$
Output: EPS - the set of non-terminals in $G$ that are nullible.
- EPS := $\{ X\ |\ X \rightarrow \varepsilon \mbox{ is a rule in $G$}\}$
- do {
for each rule $X \rightarrow Y_1 Y_2 \cdots Y_k$ in $G$} while the iteration added something new to EPSif $Y_1,Y_2,\ldots,Y_k$ are all in EPS, then add $X$ to EPS- return EPS
S -> cS | UV | aT
T -> aa|bb
U -> aT | ε
V -> bT | ε
It works! Great ... but does it always work? How could we
prove that it always returns exactly the set of nullible
non-terminals in $G$? Here's how: First, note that it is clear
that anything we add to EPS belongs there - i.e. we only add a
non-terminal $X$ to EPS if $X$ is indeed nullible. Obviously
that's true for Step 1, and it's also clear for Step 2, since we
only add $X$ to EPS when we know there is a rule $X \rightarrow
Y_1\cdots Y_k$ for which each $Y_i$ is nullible.
In fact (not to sow panic amongst my students), "if $X \in $ EPS
then $X$ is nullible" is a loop invariant for both
the inner and outer loop.
Since "if $X \in $ EPS then $X$ is nullible" is a loop invariant, when we leave the outer loop, we know that every $X$ in EPS is nullible. So, what we really have to show is that nothing is missing from our final returned EPS, i.e. if $X$ is nullible then $X \in$ EPS. To do this, we will show that during the algorithm, if there is a nullible non-terminal that is not currently in EPS, there must be a grammar rule $X \rightarrow Y_1\cdots Y_k$ such that each $Y_i$ is already in EPS, but $X$ is not in EPS. That way we won't terminate the outer loop while anything is missing from EPS. So let's prove that:
Theorem: if there is a
nullible non-terminal that is not currently in EPS, there is a
grammar rule $X \rightarrow
Y_1\cdots Y_k$ for which each $Y_i$ is already in EPS.
Proof:
Let $T$ be a shortest parse sub-tree whose root, which we will
call $X_T$,
is not in EPS, and $T$ yields $\varepsilon$, i.e. all leaves
are $\varepsilon$. (To clarify, there are potentially many
such trees - even infinitely many.
Out of all of them, find one that is shortest - if there are
ties, I don't care which. That's the one
I want!) Note that $X_T \rightarrow \varepsilon$ cannot be a
rule in $G$, since otherwise X_T would already be
in EPS from Step 1. Therefore, if we draw out the parse tree $T$, it
looks like this:

Algorithm: makeFIRST
Input: Grammar $G$, and EPS, the result of makeEPS($G$)
Output: FIRST - a map that maps a grammar symbol $X$ to the set of terminals that appear as the first character in at least one string derived from $X$. I.e. terminal $c$ is in FIRST($X$) if and only if $X \rightarrow^* cw$ for some string $w$.
- for all grammar symbols x, FIRST(x) := {x} if x is a terminal, and { } if it is a non-terminal
- do {
for each rule $X \rightarrow Y_1 Y_2 \cdots Y_k$ in $G$} while the iteration added something new to FIRSTfor $i$ from $1$ to $k$FIRST($X$) := FIRST($X$) $\cup$ FIRST($Y_i$)
if $Y_i \notin \mbox{EPS}$ break out the inner for loop- return FIRST
It's good to try working through the algorithm by hand on a sample grammar. Even one as simple as the example above will do.
Now for the big question: How do we know that this algorithm is correct? I.e. that "terminal $c$ is in FIRST($X$) if and only if $X \Rightarrow^* cw$ for some string $w$"? It should be clear that if we add $c$ to FIRST($X$) in the innermost loop, then there must be a derivation "$X \Rightarrow^* cw$ for some string $w$". In fact, we can show that to be an invariant for all three loops. So what remains to be proven is that if there is a non-terminal $X$ and terminal $c$ such that $X \Rightarrow^* cw$ for some string $w$, but $c \notin $ FIRST($X$) when we start another iteration of the do-while loop, then something new will be added during that do-while. That will show that we cannot exit the do-while until FIRST is exactly what we need it to be.
Theorem: If there is a non-terminal $X$ and terminal $c$ such
that $c \notin$ FIRST[$X$] but $X \Rightarrow^*_G cu$ for some
$u$, then the the next pass through the do-while loop will add
$c$ to FIRST[$X'$] for some non-terminal $X'$ that doesn't
currently have $c$ is its FIRST set.
Proof:
Consider a sub-parsetree rooted at $X$ that represents a
derivation in which $c$ is the first symbol in the derived
string of terminals. Consider the path from the $X$ at the
root to the leftmost leaf labelled $c$ (i.e. the first terminal
symbol). For each non-terminal $X'$ along that path, the subtree
with that non-terminal as its root represents a derivation
$X' \Rightarrow^*_G c u$, for some $u$. So for each such $X'$,
$c$ is supposed to be in FIRST[$X'$] when the algorithm returns. Let
us fix $X'$ as the deepest vertex on that path such that
$c \notin$ FIRST[$X'$] at the current point in the algorithm.
Denote the children of $X'$ as $Y_1, Y_2, \ldots, Y_k$, and
let $Y_j$ be the next vertex down the path. Then
$Y_1,\ldots,Y_{j-1}$ are all nullible, and thus in EPS,
and, at the
current point in the algorithm, $c \in $ FIRST[$Y_j$]
(otherwise $X'$ would not be the deepest vertex as we
assumed). Thus, when the algorithm considers the rule
$X' \rightarrow Y_1 Y_2 \cdots Y_k$, $c$ will be added to
FIRST[$X'$]. And so the theorem is proved.
Algorithm: first$(w_1 w_2 \cdots w_k)$
P = { }
for $i$ from 1 to $k$P := P $\cup$ FIRST($w_i$)return P
if $w_i \notin \mbox{EPS}$ break
Algorithm: makeFOLLOW
Input: Grammar $G$, EPS - the result of makeEPS($G$), and FIRST - the result of makeFIRST($G$)
Output: FOLLOW - a map that maps a non-terminal grammar symbol $X$ to the set of terminal symbols that can appear in a string derived from $G$ as the very next symbol after a substring derived from $X$. In other words, $a \in \text{FOLLOW}(X)$ if and only if $S \Rightarrow_G^* uXav$.
- for all non-terminal symbols X, FOLLOW(X) = { }
- do {
for each rule $X \rightarrow Y_1 Y_2 \cdots Y_k$ in $G$} while the iteration added something new to FOLLOWfor $i$ from $1$ to $k-1$if $Y_i$ is a non-terminal then FOLLOW($Y_i$) := FOLLOW($Y_i$) $\cup$ first($Y_{i+1} Y_{i+2} \cdots Y_k$)for $i$ from $k$ down to $1$if $Y_i$ is a non-terminal FOLLOW($Y_i$) := FOLLOW($Y_i$) $\cup$ FOLLOW($X$)
if $Y_i \notin \mbox{EPS}$ break out of inner for loop- return FOLLOW
PREDICT($X \rightarrow w$) := first($w$) if $\mbox{eps}(w) = \mbox{false}$, and first($w$) $\cup$ FOLLOW($X$) otherwiseThus, each grammar rule gets its own predict set. As long as every pair of rules with the same left-hand side non-terminal have non-intersecting predict sets, the grammar is LL(1) and we can do top-down parsing.