SI413 Class 10: SLR Parsing

Section 2.3.3 of Programming Language Pragmatics.

Using the deterministic Characteristic Finite State Machine (the full one with state numbers AND symbols on the stack) given below for the grammar:
S -> E
E -> E + T | T
T -> n
  1. write down the sequence of stacks produced by the SLR(0) parsing algorithm on input n+n+n, and
  2. write down the sequence of stacks produced by the SLR(0) parsing algorithm on input n+nn+n up to the point at which the algorithm determines that the input it invalid.
  3. In the previous part, how does the algorithm determine that the input is indeed invalid?

Getting a handle on the stack in bottom up parsing

A rule with the • in it is called an "LR item". Think of them as a goal: E→•T means "I'm trying to read E by reading T", E→E+•T means "I'm trying to read E by reading E+T and I've already read E+". Each transisition represents moving to a subgoal that helps achive the previous goal in a way consistent with the grammar.

Parsing with the non-deterministic characteristic finite state machine

This machine we build this way is the non-deterministic "Characteristic Finite State Machine" (CFSM). We can convert to an equivalent deterministic machine, and that is what is usually meant by CFSM.

SLR Parsing

This procedure is called "Simple LR" parsing, or SLR. You should notice that in our example we can parse any string without any lookahead, therefore this grammar is SLR(0), i.e. SLR parsing with zero lookahead tokens suffices.

SLR(1) Parsing
More complicated grammars may not be SLR(0), so we may need lookahead. In fact, if we add multiplication and the usual prcedence, we get the grammar
S → E
E → E + T | T
T → T * F | F
F → n | (E)
which produces a deterministic CSFM that includes the following fragment:
Now, if we end up in the left state, we have some ambiguity: LR item T→T•*F tells us to shift (push a token), while LR item E→T• tells us to reduce T to E. This is a "shift/reduce conflict". With lookahead, however, we can resolve this. If the lookahead symbol is a *, we know we must reduce. Why? Because * can't follow an "E" in this grammar, so we really can't do the reduce and succeed. In contrast, * is exactly what LR item T→T•*F wants to read next. So we shift.

In other scenarios we might have two different LR items in the same state that tell us to reduce but, of course, using different reductions. This is a "reduce/reduce conflict". Sometimes lookahead can disambiguate this kind of conflict, but not in SLR parsers.

Christopher W Brown
Last modified: Wed Sep 23 13:48:35 EDT 2009