SI413: SLR Parsing

Reading

Section 2.3 of Programming Language Pragmatics.

Homework

hw

HW Review

We went over the HW in class, and we actually used the grammar and machine from that HW to cover this class's material: namely how to make parsing decisions using the Non-deterministic Characteristic Finite State Machine, how to convert that nondeterministic CFM into a deterministic CFM, and how to use that deterministic CFM to more efficiently make parsing decisions. When we added lookahead and the use of FOLLOW sets to mitigate shift-reduce conflicts, we ended up with the SLR(1) parsing!

Parsing with the non-deterministic characteristic finite state machine

This machine we build this way is the non-deterministic "Characteristic Finite State Machine" (CFSM). We can convert to an equivalent deterministic machine, and that is what is usually meant by CFSM.

SLR Parsing

This procedure is called "Simple LR" parsing, or SLR. You should notice that in our example we can parse any string without any lookahead (as long as we can detect when the input is done), therefore this grammar is SLR(0), i.e. SLR parsing with zero lookahead tokens suffices.

SLR(1) Parsing

More complicated grammars may not be SLR(0), so we may need lookahead. In fact, if we add multiplication and the usual prcedence, we get the grammar

S → E
E → E + T | T
T → T * F | F
F → n | (E)

which produces a deterministic CSFM that includes the following fragment:

Now, if we end up in the left state, we have some ambiguity: LR item T→T•*F tells us to shift (push a token), while LR item E→T• tells us to reduce T to E. This is a "shift/reduce conflict". With lookahead, however, we can resolve this. If the lookahead symbol is a *, we know we must reduce. Why? Because FOLLOW(E) does not contain *, so we really can't do the reduce and succeed. In contrast, * is exactly what LR item T→T•*F wants to read next. So we shift.

In other scenarios we might have two different LR items in the same state that tell us to reduce but, of course, using different reductions. This is a "reduce/reduce conflict". Sometimes lookahead can disambiguate this kind of conflict, but not in SLR parsers.

Constructing the CFSM directly from the grammar

Section 2.3 of Programming Lanugage Pragmatics covers how to build the deterministic CFSM directly from the grammar, rather than starting with the Non-deterministic CFSM.

SLR(1) parsing

The CFSM does most of the work of SLR(1) parsing for us. However, we do have to make a decision when the current stack's top element is the state s, and the lookahead is X: do we shift or do we reduce? If we reduce what do we reduce by? Suppose state s is marked with items I1,...,Ik. If there's a transition to state some t on symbol X, the machine's telling us to shift. If there's a rule Ij of the form A → α •, the machine's telling us to reduce by A → α. What if there's both? Well, we may simply have a conflict, but we have one more tool at our disposal. If we can list all the symbols that can follow an A, and if X is not in that list, we know we should shift.

In general, let act = { }. If there's a transition for X, add "shift" to act. For each item Ij of the form A → α •, where X ∈ FOLLOW(A), add "reduce by A → α" to act. Now, in state s with lookahead X we should:

return error if act = { }
shift if act = { shift }
reduce by A → α if act = {A → α}
return shift/reduce error if act contains shift and a reduce action
return reduce/reduce error if act contains multiple reduce actions.

Of course this, like our top-down table-diven parser construction requires that we can compute these "FOLLOW(A)" sets.