Reading
Section 2.3 of Programming Language Pragmatics.
HW Review
We went over the HW in class, and we actually used the grammar
and machine from that HW to cover this class's material: namely
how to make parsing decisions using the Non-deterministic
Characteristic Finite State Machine, how to convert that
nondeterministic CFM into a deterministic CFM, and how to use
that deterministic CFM to more efficiently make parsing
decisions. When we added lookahead and the use of FOLLOW sets
to mitigate shift-reduce conflicts, we ended up with the SLR(1)
parsing!
Parsing with the non-deterministic
characteristic finite state machine
This machine we build this way is the non-deterministic
"Characteristic Finite State Machine" (CFSM). We can
convert to an equivalent deterministic machine, and that is
what is usually meant by CFSM.
SLR Parsing
This procedure is called "Simple LR" parsing, or SLR.
You should notice that in our example we can parse any
string without any lookahead (as long as we can detect when
the input is done), therefore this grammar is
SLR(0), i.e. SLR parsing with zero lookahead tokens
suffices.
SLR(1) Parsing
More complicated grammars may not be SLR(0), so we may need
lookahead. In fact, if we add multiplication and the usual
prcedence, we get the grammar
S → E
E → E + T | T
T → T * F | F
F → n | (E)
which produces a deterministic CSFM that includes the
following fragment:
Now, if we end up in the left state, we have some ambiguity:
LR item T→T•*F tells us to shift (push a
token), while LR item E→T• tells us to reduce T
to E. This is a "shift/reduce conflict". With lookahead,
however, we can resolve this. If the lookahead symbol is a *,
we know we must reduce. Why? Because FOLLOW(E) does not
contain *, so we really can't do the reduce and
succeed. In contrast, * is exactly what LR item
T→T•*F wants to read next. So we shift.
In other scenarios we might have two different LR items in
the same state that tell us to reduce but, of course, using
different reductions. This is a "reduce/reduce conflict".
Sometimes lookahead can disambiguate this kind of conflict,
but not in SLR parsers.
Constructing the CFSM directly from the grammar
Section 2.3 of Programming Lanugage Pragmatics covers how to
build the deterministic CFSM directly from the grammar, rather
than starting with the Non-deterministic CFSM.
SLR(1) parsing
The CFSM does most of the work of SLR(1) parsing for us.
However, we do have to make a decision when the current
stack's top element is the state s, and the lookahead is X:
do we shift or do we reduce? If we reduce what do we
reduce by? Suppose state s is marked with items
I1,...,Ik. If there's a transition to state some t on symbol X,
the machine's telling us to shift. If there's a rule Ij of
the form
A → α •, the machine's
telling us to reduce by
A → α. What if
there's both? Well, we may simply have a conflict, but we
have one more tool at our disposal. If we can list all the
symbols that can follow an
A, and if
X is
not in that list, we know we should shift.
In general, let act = { }. If there's a transition for
X, add "shift" to act. For each item Ij of
the form A → α •, where X ∈
FOLLOW(A),
add "reduce by A → α" to act.
Now, in state s with lookahead X we should:
- return error if act = { }
- shift if act = { shift }
- reduce by
A → α if act = {A → α}
- return shift/reduce error if act contains shift and a
reduce action
- return reduce/reduce error if act contains multiple
reduce actions.
Of course this, like our top-down table-diven parser
construction requires that we can compute these "FOLLOW(A)"
sets.
Christopher W Brown