SI413 Class 11: FIRST and FOLLOW

Programming Language Pragmatics section 2.3's subsection about "Predict sets", pp. 79-82.

Run bison on the file ex1.ypp from Lab 4, but run it like this:
bison -v ex1.ypp
This will produce the file ex1.output which, among other things, includes a text description of the CFSM computed by bison. Open up the file with the text-editor of your choice (i.e. xemacs) and take a look. Print it out. Using this, show the stack contents for each step of parsing:
3 * ( 5 - 7 ) ; $end 
Turn in the printout of ex1.output as well as the stack contents, step-by-step.

Constructing the CFSM directly from the grammar
Instead of constructing a NDFA from a grammar, as we did last class, and then applying the usual algorithm to convert that NDFA to a DFA, we'll show that, for these CFSM's, there's an algorithm that constructs the DFA directly from the grammar.

Recall from last class that if your "goal" is the LR item exp → exp + • term, then any LR item with term on the left-hand side and • at the beginning of the right-hand side is a good "sub-goal". Moreover, all these subgoals ended up in the same state as the initial goal in the deterministic CFSM, because in the non-deterministic machine there were lambda transitions from the goal state to the sub-goal state. We define the "closure of a set I of LR items" to be the set I plus all the sub-goals for each element of I. See book for details.

With the closure, we can construct the CFSM directly from the grammar. Suppose S' → S is our dummy initial rule. closure({S'→•S}) is the initial state of the CFSM. For each state I of the CFSM and grammar symbol (terminal or non-terminal) X do the following

SLR(1) parsing
The CSFM does most of the work of SLR(1) parsing for us. However, we do have to make a decision when the current stack's top element is the state s, and the lookahead is X: do we shift or do we reduce? If we reduce what do we reduce by? Suppose state s is marked with itemes I1,...,Ik. If there's a transition to state t on symbol X, the machine's tellin us to shift. If there's a rule Ij of the form A → α •, the machine's telling us to reduce by A → α. What if there's both? Well, we may simply have a conflict, but we have one more tool at our disposal. If we can list all the symbols that can follow an A, and if X is not in that list, we know we should shift.

In general, let act = { }. If there's a transition for X, add "shift" to act. For each item Ij of the form A → α •, where X ∈ FOLLOW(A), add "reduce by A → α" to act. Now, in state s with lookahead X we should:

Of course this, like our top-down table-diven parser construction requires that we can compute these "FOLLOW(A)" sets.

Top-down parsing required us to compute "FIRST(A)" sets, i.e. the set of tokens (or λ) that can appear as the first symbol in a string derived from A. This will in fact be the basis for computing FOLLOW(A) sets as well, so we'll start with FIRST ... first. The book gives the algorithm.

It's convenient to define FIRST(β) for an arbitrary string of terminals and non-terminals. See book.

FOLLOW(A) is the set of a all symbols that can follow an A in a sentential form for the grammar. See book for algorithm.

Christopher W Brown
Last modified: Mon Sep 28 13:41:09 EDT 2009