pat using recursive descent parsing. The language is a
simple language for defining sequences. Here are some
examples:
> ./pat a b c; a b c [a b c]_r; c b a [a b b d]:X; a b b d X X_r; a b b d d b b a [a b c [d e]:X f g h] X_r; a b c d e f g h e dPattern symbols (called "atoms") are strings beginning with lower-case letters (such as 'a', 'b', or 'cat'), pattern variables are strings beginning with upper-case letters. Square brackets are used for grouping. A sequence followed by
: NAME is assigned to a variable
as a side effect. That variable is in scope from that moment
on until the interpreter is exited (with ctrl-d). The
_r operator reverses a sequence. It's precedence
(and variable assignment) are higher than concatenation, so
a b [c d]_r gives a b d c, not d c b a. Finally, there's an
operator * that interleaves two sequences, so
[a b c] * [x y z]; a x b y c zThis operator has lowest precedence, so the [ ]'s above are unnecessary. If the sequences have different lengths, the shorter is padded to the same length by repeating the last entry. Here is the grammar for a
pat program,
which is just a sequence of ;-terminated pat
expressions.
S -> seq STOP S | λ ← STOP is ; seq -> seq FOLD cseq | cseq ← FOLD is * cseq -> cseq pseq | pseq pseq -> pseq COLON NAME | pseq POP | aseq ← POP is _r aseq -> ATOM | NAME | LB seq RB ← LB and RB are [ and ]
The grammar above is probably the natural grammar for this language, at least if you want to specify the associativities and precedences of the language's operators. However, it is not appropriate for LR (top-down) parsing. Please stop for a moment, look at the grammar and make sure you understand why. Which rules are troublesome?
For purposes of this lab, I'm going to give you a rewriting of the grammar in a form that is amenable to top-down parsing (though I'd like you to be able to do it yourselves!):
S -> seq STOP S | λ seq -> cseq stail stail -> FOLD cseq stail | λ cseq -> pseq ctail ctail -> pseq ctail | λ pseq -> aseq ptail ptail -> COLON NAME ptail | POP ptail | λ aseq -> ATOM | NAME | LB seq RBYou write a recursive descent top-down parser. You should refer to Class 9 to see how this works. I've got the following shell for you:
/*** This is file: pat.l ***/
%{
#include "pat.h"
#include <iostream>
using namespace std;
%}
%option noyywrap
%%
[a-z][a-zA-Z0-9\^]* { return ATOM; }
"*" { return FOLD; }
[;] { return STOP; }
":" { return COLON; }
[A-Z][a-zA-Z0-9\^]* { return NAME; }
"_r" { return POP; }
"[" { return LB; }
"]" { return RB; }
<<EOF>> { return 0; }
[ \t\n]+ { }
. { cerr << "Unrecognized token!" << endl; exit(1); }
%%
|
/*** This is file: pat.h ***/
// Token labels
const int ATOM=1, FOLD=2, STOP=3, COLON=4,
NAME=5, POP=6, LB=7, RB=8;
// Prototype for yylex(), aka getNextToken()
int yylex();
|
/*** This is file: pat.cpp ***/
#include "pat.h"
#include <iostream>
using namespace std;
int next = -1; // store next token
//-- Helper functions
void perror() {
cerr << "Parse error!" << endl;
exit(1); }
int peek() {
if (next == -1) next = yylex();
return next; }
void match(int tok) {
if (tok == peek()) next = -1;
else perror(); }
|
You might also want to create a Makefile:
def: flex pat.l g++ -o pat lex.yy.c pat.cppDon't forget the tab's! From this skeleton you actually need to write the recursive descent parser! When it works, you should be able to enter statement after statement with no feedback, until a syntax error cases a message and aborts the program. For example:
> ./pat a b c; [a b c]:X X_r; [ a b : c d ]; Parse error!
pat mini-language. I suggest you look at
this
explanation of recursive descent for the simple calculator
language. See how semantic values (which are all int's
for the calculator) are handled? See how we evaluate across
those akward "tail" productions?
Now, for your interpreter I suggest that the tokens have C++
string objects as semantic values, and that
non-terminals have vector<string> objects
for semantic values, i.e. that's what the grammar rule
functions return. Because I like you, I've implemented a
few functions you may find helpful, they implement the fold
(*) operation, concatenation and the reverse (_r) operattion:
#include <vector>
#include <string>
#include <algorithm>
vector<string> fold(const vector<string> &A, const vector<string> &B)
{
vector<string> res;
for(unsigned long int i = 0; i < A.size() || i < B.size(); ++i)
{ res.push_back(A[min(i,A.size()-1)]); res.push_back(B[min(i,B.size()-1)]); }
return res;
}
vector<string> concat(const vector<string> &A, const vector<string> &B)
{
vector<string> res;
for(int i = 0; i < A.size(); ++i) res.push_back(A[i]);
for(int i = 0; i < B.size(); ++i) res.push_back(B[i]);
return res;
}
vector<string> rev(const vector<string> &A)
{
vector<string> res;
for(int i = A.size() - 1; i >= 0; --i) res.push_back(A[i]);
return res;
}
Hints:
vector<string>'s for the cases you want
to ignore. Get that working then add the non-λ
ctail → pseq ctail rule. Get that
working then add the rest. Leave variables for last.
pat-language
variables. Once again we'll use maps, but this time we're
mapping strings to vectors of strings, i.e.:
map<string, vector<string> > symTable;Make sure you use that spaces in between > and >. Why? Becauase C++'s scanner treats >> as a single token, not as two >'s.
pat interpreter.
Be sure to document it, though. Your makefile
should make the program, of course, and it should still be
called pat. There should be a README file
that contains your name as well as a brief description of the
program.
Assuming your lab is in a directory
/home/m101111/si413/lab05, you would submit as
follows:
cd /home/m101111/si413 /courses/wcbrown/etc/si413-submit lab05You may resubmit later, I'll simply look at the submission with the latest time stamp.