# SI413 Lab 7: LR parsing & prcedence/associativity: abstract syntax tree

Do this lab with a partner. Keep answers to the questions in a README file that you will submit at the end.
Part 1 -- Parse trees for programs
Consider a simple programing language with ifs, while-loops, variable assignment, boolean and comparison operators, and arithmetic. Below we give the language's grammar and a simple example program:
 Grammar for simple language `prog1.txt` ```res: block | stmt block: LB stmtlist RB stmtlist: stmtlist stmt | stmt stmt: ID ASN exp STMTTERM | IF LP exp RP block | IF LP exp RP block ELSE block | WHILE LP exp RP block | READ ID STMTTERM | WRITE exp STMTTERM exp: exp OROP aexp | aexp aexp: aexp ANDOP nexp | nexp nexp: NOT cexp | cexp cexp: cexp COMPOP sexp | sexp sexp: sexp OPA term | term term: term OPM factor | sfactor sfactor: OPA factor | factor factor: NUM | ID | LP exp RP | BCONST ``` ```# Computes n! { read n; f := 1; while(n > 0) { f := f * n; n := n - 1; } write f; } ```

Notice that this grammar encodes the precendece and associativity rules for OR, AND, NOT <, > etc.,+/-, and */div. This is done it the usual way. That forces a certain amount of complexity in the grammar. Take a look at the parse tree for `prog1.txt`. (It's postscript, so view with the "gv" program.) You should be able to read off the predecences here.

1. Of the possible operators in an "exp", what has lowest precedence? (look at the grammar or the postscript)
2. Add the parentheses implicit in
```not  3  +  4  *  5  <  6  -  x  and  y  <  0  or  false
```
due to operator precedences.
3. Consider
```3 < 4 < 5
```
This is syntactically valid for an `cexp`. How could I modify the grammar so that this would not be syntactically valid? (put the modified part into your README file)

Part 2 - Specifying precedence and associativity for LR parsers
What's wrong with the grammmar
```exp: exp OPA exp | exp OPM exp | NUM | LP exp RP
```
for arithmetic expressions? It's ambiguous, which means that there's more than one parse tree for the same string. The unambiguous grammar for this language that we've used in the past is more complex, but it's unambiguous, and the unique parse tree it yields respects our associativity and precedence requirements. However, not only is the grammar complex, but the parse trees are huge (see the above) with lots of subtrees that are just "reinterpretation" steps, e.g. in `x := 5 ;` we have
```exp -> term -> sfactor -> factor -> NUM(5)
```
just to interpret 5 as the right-hand side of an assignment. Not only are the parse trees huge, but the parser takes a lot of steps simply to make all those reinterpretations.

What happens with an LR parser if we use the ambiguous grammar above? What happens, of course, is lots of shift/reduce and reduce/reduce conflicts. But why don't we keep the CFSM produced from this grammar, which is nice and small, and augment the machine with some rules it can use to resolve these conflicts; rules that stem from our associativity and precedence expectations. This works! We get a simpler grammar, a smaller CFSM, a faster parser (since it's not making all those extra moves), and a simpler parse tree. Everyone's happy!

The question is, can we generalize this? Can we augment parser generators like bison with a mechanism by which the input tells the tool how to diabiguate? The answer is yes (of course). The yacc/bison input file can include specifications of associativity and precedence for some or all tokens. Each rule gets a precedence which it inherits from the the right-most token in the rule (note: this looks only at the terminals. Non-terminals don't count here). Additionally, rules can be assigned a precedence level explicitly.

Page 75 of The Bison 2.3 Manual:
Finally, the resolution of conflicts works by comparing the precedence of the rule being considered with that of the look-ahead token. If the token's precedence is higher, the choice is to shift. If the rule's precedence is higher, the choice is to reduce. If they have equal precedence, the choice is made based on the associativity of that precedence level. The verbose output file made by `-v' (see Chapter 9 [Invoking Bison], page 97) says how each conflict was resolved.

Associativity of tokens are assigned by "`%left token-name`", "`%right token-name`", and "`%nonassoc token-name`" statements in the bison file. Use these statements instead of "`%token token-name`" which we had before. These come before the grammar rules, and the Relative precedence of these tokens is defined by the order in which the statements appear: first in the file has lowest precedence, last in the file has highest precedence. To assign a rule a precedence explicitly, you put "`%prec token-name`" after the rule's right-hand side. Sometimes you use "dummy" token names just to make a precedence level to assign a rule. For arithmetic we'd say:

```%left OPA
%left OPM
%right UPLUSMINUS

%%

exp: exp OPA exp
|    exp OPM exp
|    OPA exp        %prec UPLUSMINUS
|    NUM
|    LP exp RP

%%
```
Notice how we used the dummy token `UPLUSMINUS` to get unary minus's as in `3 + (-5*3 - 8)` to be handled properly. The scanner never returns such a token, it's sole purpose is to create the proper precedence level.
1. Suppose I wanted to add exponentiation, with a POW token for the ^ symbol. Add on the the above to make this change. Exponentiation should have the highest possible precedence, so
```2 + - 3 ^ 2 * 6  <===>  2 + ((- (3 ^ 2)) * 6)
```
and the associativity of exponentiation should work like this:
```2^3^4  <===>  2^(3^4)
```

Part 3 - Using associativity and precedence
Make sure you understand this: Adding precedence and associativity declarations to an ambiguous grammar allows LR parsers to produce the unique parse tree we want. But it only works for some grammars/languages. Fortunately, this includes a lot of what we want for programming languages.

So the simple progamming language above gets a simpler grammar definition now that we can use precedence and associativity.

 Grammar for simple language `prog1.txt` ```%left OROP %left ANDOP %right NOT %left COMPOP %left OPA %left OPM %right UPLUSMINUS %% res: block | stmt block: LB stmtlist RB stmtlist: stmtlist stmt |stmt stmt: ID ASN exp STMTTERM | IF LP exp RP block | IF LP exp RP block ELSE block | WHILE LP exp RP block | READ ID STMTTERM | WRITE exp STMTTERM exp: exp OPA exp |exp OPM exp | OPA exp %prec UPLUSMINUS | exp COMPOP exp | NOT exp | exp ANDOP exp | exp OROP exp | NUM | ID | LP exp RP | BCONST %% ``` ```# Computes n! { read n; f := 1; while(n > 0) { f := f * n; n := n - 1; } write f; } ```

Take a look at the parse tree for `prog1.txt` from this grammar. (It's postscript, so view with the "gv" program.) You should notice that it's a lot simpler! Make sure you understand how this code works!

```> gunzip patL07.tar.gz
> tar xf patL07.tar
```
Now cd to the patL07 directory (which the above created) and read the README file to find out how to compile and run the program ... as well as discover a bit about what it does. Play with it a bit: look at the grammar, enter an expression, look at the parse tree. Your job is to modify pat.ypp so that S and seq are the only non-terminals! You'll need to figure out the proper associativities and precedences.
• Hint: Concatenation doesn't have any operator associated with it, so you'll have to use a dummy token and assign the concatenation rule its precedence with the %prec statement.
• Hint: You'll probably end up with some shift/reduce errors. You need to resolve these by looking at the bison output file `pat.output`. You'll see the states of the CFSM, along with the LR items that label those states and the actions for various lookahead symbols. If there's more than one action listed for a lookahead symbol, there's a shift-reduce error, i.e. an ambiguity as to what action to take. You can resolve those ambiguities by assigning the appropriate precedence (or associativity) to to the tokens involved in the shift-reduce erros. For example, if state 14 has this conflict :
```state 14

2 seq: seq . FOLD seq
2    | seq FOLD seq .
3    | seq . seq
4    | seq . COLON NAME
5    | seq . POP

POP    shift, and go to state 9
COLON  shift, and go to state 10
ATOM   shift, and go to state 1
NAME   shift, and go to state 2
LB     shift, and go to state 3

ATOM      [reduce using rule 2 (seq)]
NAME      [reduce using rule 2 (seq)]
LB        [reduce using rule 2 (seq)]
\$default  reduce using rule 2 (seq)

seq  go to state 12
```
This state is saying that if there is a ATOM or NAME or LB token next, it doesn't know what to do: either shift or reduce (using rule 2). I want to make sure that I do *not* reduce by "`2 seq: seq FOLD seq •`", since FOLD is supposed to have lowest precedence. Instead I want to shift an ATOM or NAME or LB so that I build up the `seq` on the right as well as the left before I finally reduce by the FOLD rule. If these tokens had some associativity assigned already, then that bison info that is quoted above (from page 75 of the manual) tells me that associativity would resolve the conflict -- but not necessarily in the way I wanted. Instead, I should give ATOM, NAME and LB higher precedence than FOLD, and then bison will know to shift instead of reduce.

What to turn in
Submit a working version of the `pat` parse tree viewer that uses the ambiguous grammar + assoc/prec rules. Be sure to document it, though. Your makefile should make the program, of course, and it should still be called `pat`. There should be a README file that contains your name(s), a brief description of the program, and
1. the answers to all the questions posed above,
2. the answer to the questions "What is the advantage to your new pat grammar + assoc/prec? How do the parse trees created by your version compare to those created by the original?",
3. a thorough inventory of what works and doesn't in your implementation. (Actually, I only want your program to include working features, whatever you couldn't get to work ought to simply be left out, but documented in the README).

Submit your DIRECTORY using the online submit instructions. You may resubmit later, I'll simply look at the submission with the latest time stamp.

Christopher W Brown