**Reading**- Section 2.3.2 of Programming Language Pragmatics. Don't worry too much about the portion entitled "Predict Sets", pp. 79-82, but do pay attention to the section "Writing an LL(1) Grammar".
**Homework**-
Suppose you wanted to add a feature to the calculator for
computing max, so you could enter:
max{3, (2+1)*7/4, 68 - 3*4*5 - 9/2};

... and get the answer 5. Write out the grammar for our language with this feature added, along with telling me which new tokens would be required. (Just augment the grammar below.)**Turn in:**A list of the new tokens, the augmented grammar and a parse tree using your augmented grammar on the string`1 + max{3,7,-2};`

. I'd be pretty excited if you implemented it on top of the recursive descent parser below, by the way, but I won't require it.

**Top-down parsing**-
We'll be looking at two kinds of top-down parsers today, and
for both we'll use the following example grammar:
0: stmt → exp STOP

Where e is the empty string. We can use the same flex scanner we used with the previous lab.

1: exp → term etail

2: etail → OPA term etail

3: etail → λ

4: term → factor ttail

5: ttail → OPM factor ttail

6: ttail → λ

7: factor → LP exp RP

8: factor → NUM[0-9]+ { return NUM; } [\+|\-] { return OPA; } [\*|/] { return OPM; } "(" { return LP; } ")" { return RP; } ";" { return STOP; } <

> { return 0; } [ \t\n]+ { } . { cerr << "Unrecognized token!" << endl; exit(1); } **Recursive descent parsers**-
Recursive descent parsers implement the abstract top-down
parsing we discussed last class with a twist: instead of using
an explicit stack data structure, we implicitly use the
function call stack. Thus, instead of pushing an
`exp`

symbol, we call an`exp()`

function (which the system pushes onto the call stack). When there's more than one right-hand-side for a given non-terminal, like`factor`

, we select based on the look-ahead.**#include "rd.h"**//-- Prototypes and globals**void**stmt();**void**exp();**void**etail();**void**term();**void**ttail();**void**factor();**int**next = -1; //-- Helper functions**void**perror()**{**cerr << "Parse error!" << endl; exit(1);**}****int**peek()**{****if**(next == -1) next = yylex();**return**next;**}****void**match(**int**tok)**{****if**(tok == peek()) next = -1;**else**perror();**}**//-- Grammar rule functions**void**stmt()**{**exp(); match(STOP);**}****void**exp()**{**term(); etail();**}****void**etail()**{****switch**(peek())**{****case**OPA: match(OPA); term(); etail();**break**;**case**RP:**case**STOP:**break**;**default**: perror();**break**;**}****}****void**term()**{**factor(); ttail();**}****void**ttail()**{****switch**(peek())**{****case**OPM: match(OPM); factor(); ttail();**break**;**case**RP:**case**STOP:**case**OPA:**break**;**default**: perror();**break**;**}****}****void**factor()**{****switch**(peek())**{****case**LP: match(LP); exp(); match(RP);**break**;**case**NUM: match(NUM);**break**;**default**: perror();**break**;**}****}****int**main()**{**stmt();**}****#include <iostream>****using****namespace**std;**const****int**NUM = 0, OPA = 1, OPM = 2, LP = 3, RP = 4, STOP = 5;**int**yylex();%

**{****#include "rd.h"**%**}**%option noyywrap %% [0-9]+**{****return**NUM;**}**[\+|\-]**{****return**OPA;**}**[\*|/]**{****return**OPM;**}**"("**{****return**LP;**}**")"**{****return**RP;**}**";"**{****return**STOP;**}**<<EOF>>**{****return**0;**}**[ \t\n]+**{****}**.**{**cerr << "Unrecognized token!" << endl; exit(1);**}**%%You can get rd.h, rd.cpp and ex1.l and compile like this:

flex ex1.l g++ -o mycalc lex.yy.c rd.cpp

If the user enters a syntactically correct arithmetic expression, the program simply returns. Otherwise, it prints out an error message. In class we traced out the callstack for at least part of parsing "3+4;". Make sure you're clear on how that works. **Table-driven top-down parsers**-
Be sure to read the books description of table driven top-down
parsing. They give a general-purpose view of things. We
looked at the specific example of a table-driven top-down
parser for our simple calculator language. The idea is to
explicitly model our abstract stack-based top-down parser.
We need to have both tokens and non-terminal symbols on the
stack, which means we need a common type. We're simply assigning
an int to each of these symbols so a stack of int's suffices.

Using the above, we can tell that a symbol is a token by testing whether it's less than**const****int**NUM = 0, OPA = 1, OPM = 2, LP = 3, RP = 4, STOP = 5;**const****int**stmt = 10, exp = 11, etail = 12, term = 13, ttail = 14, factor = 15;**const****int**error = -1;**const****int**startSym = stmt;`startSym`

. The scanner function`yylex()`

will just return the int value of the next token, like always, and we'll use the same helper functions we used above. Here's the parser, which will require a bit of explanation:

The main idea is that the parser has a table, parseTable, that it uses to determine which grammar rule to use next. It pops symbol**void**parse()**{**stack<**int**> S; S.push(startSym);**while**(**true**)**{****int**sym = S.top(); S.pop();**if**(sym < startSym)**{**match(sym);**if**(sym == STOP)**return**;**}****else****{****int**action = parseTable[sym - startSym][peek()];**if**(action == error) pperror();**int**k = 0;**while**(rule[action][k] != -1) ++k;**while**(--k >= 0) S.push(rule[action][k]);**}****}****}**`sym`

and finds token`tok`

with the lookahead, and`parseTable[sym][tok]`

tells it what rule to use. We also keep the rules in an array so we can look up the rule and iterate over the symbols in the right-hand side. If a certain symbol/token pair indicates a parse error, we indicate it in the table with -1.//-- These are the grammar rules. -----------------//

**const****int**rule[9][10] =**{****{**exp, STOP, -1**}**, // Rule 0**{**term, etail, -1**}**, // Rule 1**{**OPA, term, etail, -1**}**, // Rule 2**{**-1**}**, // Rule 3**{**factor, ttail, -1**}**, // Rule 4**{**OPM, factor, ttail, -1**}**, // Rule 5**{**-1**}**, // Rule 6**{**LP, exp, RP, -1**}**, // Rule 7**{**NUM, -1**}**// Rule 8**}**; //-- parseTable[symbol][token] = predicted rule. --//**const****int**parseTable[6][10] =**{**/* NUM OPA OPM LP RP STOP */ /* stmt */**{**0, -1, -1, 0, -1, -1**}**, /* exp */**{**1, -1, -1, 1, -1, -1**}**, /* etail */**{**-1, 2, -1, -1, 3, 3**}**, /* term */**{**4, -1, -1, 4, -1, -1**}**, /* ttail */**{**-1, 6, 5, -1, 6, 6**}**, /* factor */**{**8, -1, -1, 7, -1, -1**}****}**;The cool thing about this setup is that the

`parse()`

function is the same*for any LL(1) language!*All that changes from run to run is the list of grammar rules and the table! Automatically generated top-down parsers are of this kind. Of course generating the table takes some doing. In class we talked about FIRST and FOLLOW, which the book describes in considerably more detail.You can get rd.h, rd.cpp and ex1.l and compile like this:

flex ex1.l g++ -o mycalc lex.yy.c rd.cpp

... if you want to test this version out.

Christopher W Brown Last modified: Wed Sep 23 22:27:11 EDT 2009