max{3, (2+1)*7/4, 68 - 3*4*5 - 9/2};
... and get the answer 5. Write out the grammar for our
language with this feature added, along with telling me which
new tokens would be required. (Just augment the grammar
below.)
Turn in: A list of the new tokens, the augmented
grammar and a parse tree using
your augmented grammar on the string 1 + max{3,7,-2};.
I'd be pretty excited if you
implemented it on top of the recursive descent parser below,
by the way, but I won't require it.
0: stmt → exp STOPWhere e is the empty string. We can use the same flex scanner we used with the previous lab.
1: exp → term etail
2: etail → OPA term etail
3: etail → λ
4: term → factor ttail
5: ttail → OPM factor ttail
6: ttail → λ
7: factor → LP exp RP
8: factor → NUM
[0-9]+ { return NUM; }
[\+|\-] { return OPA; }
[\*|/] { return OPM; }
"(" { return LP; }
")" { return RP; }
";" { return STOP; }
<> { return 0; }
[ \t\n]+ { }
. { cerr << "Unrecognized token!" << endl; exit(1); }
exp symbol, we call an exp()
function (which the system pushes onto the call stack). When
there's more than one right-hand-side for a given
non-terminal, like factor, we select based on the
look-ahead.
#include "rd.h"
//-- Prototypes and globals
void stmt(); void exp(); void etail();
void term(); void ttail(); void factor();
int next = -1;
//-- Helper functions
void perror() { cerr << "Parse error!" << endl; exit(1); }
int peek() { if (next == -1) next = yylex(); return next; }
void match(int tok) { if (tok == peek()) next = -1; else perror(); }
//-- Grammar rule functions
void stmt() { exp(); match(STOP); }
void exp() { term(); etail(); }
void etail() {
switch(peek()) {
case OPA: match(OPA); term(); etail(); break;
case RP: case STOP: break;
default: perror(); break;
}
}
void term() { factor(); ttail(); }
void ttail() {
switch(peek()) {
case OPM: match(OPM); factor(); ttail(); break;
case RP: case STOP: case OPA: break;
default: perror(); break;
}
}
void factor() {
switch(peek()) {
case LP: match(LP); exp(); match(RP); break;
case NUM: match(NUM); break;
default: perror(); break;
}
}
int main() { stmt(); } |
#include <iostream> using namespace std; const int NUM = 0, OPA = 1, OPM = 2, LP = 3, RP = 4, STOP = 5; int yylex(); |
%{
#include "rd.h"
%}
%option noyywrap
%%
[0-9]+ { return NUM; }
[\+|\-] { return OPA; }
[\*|/] { return OPM; }
"(" { return LP; }
")" { return RP; }
";" { return STOP; }
<<EOF>> { return 0; }
[ \t\n]+ { }
. { cerr << "Unrecognized token!" << endl; exit(1); }
%%
|
You can get rd.h, rd.cpp and ex1.l and compile like this:
flex ex1.l g++ -o mycalc lex.yy.c rd.cppIf the user enters a syntactically correct arithmetic expression, the program simply returns. Otherwise, it prints out an error message. In class we traced out the callstack for at least part of parsing "3+4;". Make sure you're clear on how that works.
const int NUM = 0, OPA = 1, OPM = 2, LP = 3, RP = 4, STOP = 5; const int stmt = 10, exp = 11, etail = 12, term = 13, ttail = 14, factor = 15; const int error = -1; const int startSym = stmt;Using the above, we can tell that a symbol is a token by testing whether it's less than
startSym.
The scanner function yylex() will just return the
int value of the next token, like always, and we'll use the
same helper functions we used above. Here's the parser, which
will require a bit of explanation:
void parse()
{
stack<int> S;
S.push(startSym);
while(true)
{
int sym = S.top(); S.pop();
if (sym < startSym)
{
match(sym); if (sym == STOP) return;
}
else
{
int action = parseTable[sym - startSym][peek()];
if (action == error) pperror();
int k = 0; while(rule[action][k] != -1) ++k;
while(--k >= 0) S.push(rule[action][k]);
}
}
}
The main idea is that the parser has a table, parseTable, that
it uses to determine which grammar rule to use next. It pops
symbol sym and finds token tok with
the lookahead, and parseTable[sym][tok] tells it
what rule to use. We also keep the rules in an array so we
can look up the rule and iterate over the symbols in the
right-hand side. If a certain symbol/token pair indicates a
parse error, we indicate it in the table with -1.
//-- These are the grammar rules. -----------------//
const int rule[9][10] = {
{ exp, STOP, -1 }, // Rule 0
{ term, etail, -1 }, // Rule 1
{ OPA, term, etail, -1 }, // Rule 2
{ -1 }, // Rule 3
{ factor, ttail, -1 }, // Rule 4
{ OPM, factor, ttail, -1 }, // Rule 5
{ -1 }, // Rule 6
{ LP, exp, RP, -1 }, // Rule 7
{ NUM, -1 } // Rule 8
};
//-- parseTable[symbol][token] = predicted rule. --//
const int parseTable[6][10] = {
/* NUM OPA OPM LP RP STOP */
/* stmt */ { 0, -1, -1, 0, -1, -1 },
/* exp */ { 1, -1, -1, 1, -1, -1 },
/* etail */ { -1, 2, -1, -1, 3, 3 },
/* term */ { 4, -1, -1, 4, -1, -1 },
/* ttail */ { -1, 6, 5, -1, 6, 6 },
/* factor */ { 8, -1, -1, 7, -1, -1 }
};
The cool thing about this setup is that the
parse()
function is the same for any LL(1) language!
All that changes from run to run is the list of grammar
rules and the table! Automatically generated top-down
parsers are of this kind. Of course generating the table
takes some doing. In class we talked about FIRST and FOLLOW,
which the book describes in considerably more detail.
You can get rd.h, rd.cpp and ex1.l and compile like this:
flex ex1.l g++ -o mycalc lex.yy.c rd.cpp... if you want to test this version out.