yylex()
functions for
bison-generated
parsers. Flex and bison are GNU versions of lex and yacc,
which are traditional Unix development tools.
They both have manuals online: flex
manual, bison manual.
calc1
and save the three
Example 1 files
ex1.ypp,
ex1.l, and
Makefile into it.
Compile and run.
[WARNING: throughout the course, make sure the browser doesn't add some extension like .txt to the filenames. If necessary, change the file type to "All files" before saving.]
// print n as a signed binary number void printsbin(int n) { if (n < 0) { cout << "-"; n = -n; } char res[33] = { '\0' }; int i; for(i = 31; n != 0; --i, n /= 2) res[i] = (n & 1 ? '1' : '0'); cout << (i == 31 ? "0" : res + i + 1); }
Now this is going to require modifying the bison file
ex1.ypp
as well as the flex file. It
should be a simple change if you understand the system,
though:
you need to define a new token (I might call it STOPB,
or something like that), you need to modify your flex
file to recognize the new token, and you need to create
a new rule in the grammar to use the token.
> ./mycalc 10+3*5; 25 10+3*5;b 11001 10b+3*5;b 10001 10b+3*5; 17Once you've got this working, flag me down and show me!
/* like this */
are not completely straightforward with flex. Why? Well,
suppose you added the flex rule:
"/*".*"*/" { }This doesn't work because
.*
doesn't capture
\n
's, and the whole point of C-style comments
is to allow multi-line comments. So we make it like this:
"/*"(.|\n)*"*/" { }This doesn't work either, and the problem is more fundamental. Remember that the rule is that the longest match possible is the one flex makes, so in
/* start */ x = 4; /* end */flex thinks the whole string is a comment. In this case, we don't want maximal munch! (We have the same problem with string literals.) So we'll take the easy way out and use another common comment convention: A "#" character starts a comment and the comment extends to the next newline character. Here's an example session:
> ./mycalc 1+2+3+4+5+6+7+8+9+10; # sum 1..10 computed explicitly 55 10*(10+1)/2; # sum 1..10 computed by Gauss's closed form 55Note that although the newline is part of the comment, the comment does separate tokens. So
1000 # beginning of one million 000 # end of one millionis interpreted as 1000 followed by 0, which gives an error. Not a lexical error, though, rather a parse error!
Food for thought: One of the real advantages of using a tool like flex as opposed to writing your scanner by hand is that adding new tokens usually has no effect whatsoever on what you've already done. You simply add a new line to your flex file. When you do this by hand, that's not the case: a new token may change the underlying finite automaton substantially.
:=
, =
,
to
as in x to 1
,
and ← as in x ← 1
.
You can decide for yourself.
[a-zA-Z_][a-zA-Z0-9_]*Let's require variable names to be alpha-numerical, with a non-numeric initial character.
Semantically, we have a few decisions as well:
res: NAME ASN exp STOP
, where NAME and ASN
are new tokens.
Finally, we have some technical implementation issues to worry about.
#include <string> #include <iostream> #include <map> using namespace std; int main() { map<string,int> T; string s; for(int i = 1; cin >> s && s != "done"; ++i) T[s] = i; cout << endl; cin >> s; if (T.find(s) == T.end()) cout << "String " << s << " was not entered." << endl; else cout << "String " << s << " was the " << T[s] << "th string entered." << endl; } |
#include <map>
to
use maps. We can simply declare our symbol table as a map
in the initial declarations section of the bison file.
The way this works is that T
is initially empty, but as
soon as you subscript it with something, like
T[string("the")]
,
the mere act of subscripting creates
an entry with key "the"
. One thing about
this is that you can't really test to see whether
there's already an entry for "the"
this
way.
The way you make that test is:
if (T.find(string("the")) == T.end()) ...
.
string
(i.e. C++-style
string) in the %union
statment, but that
doesn't work. Bison is unable to handle any kind of
object there that has a constructor. So ... we will pass
around C-style strings instead. So you'll add something
like
%union { int val; char sym; char *str; };to your bison file so that you can handle semantic values that are C-style strings. One pitfall: in your flex file,
yytext
is a C-style string giving the text
that matched the token's pattern. However, you can't just
assign that to yylval.str
, because flex may
use yytext's memory for later scanning. Instead you need
to make a copy of yylval.str
. The
strdup
function from C's string library
(that'd be #include <cstring>
in C++)
does that for you. You can check it out with man
strdup
. This approach of passing C-style strings
as semantic values is fine for this lab, but it has some
long term issues, since we may "lose" strings before we
have a time to free the memory they use.
When you've got this all working, you should be able to run something like this:
> ./mycalc n1 := 7; n2 := 93; # Now we compute the sum n1 + ... + n2 n2*(n2 + 1)/2 - (n1-1)*n1/2; 4350 # Now in binary! n2*(n2 + 1)/2 - (n1-1)*n1/2;b 1000011111110
mycalc
. There should be a README file
that contains your name as well as a brief description of the
calculator program. This is where your new feature should be
described AND be sure to include some exmaple input in the README that shows off your new feature.
Submit your whole lab04 directory electronically by following the directions on the course home page. AND turn in hardcopy printout of your files.