Your code may be in "free form" Fortran, according to the Fortran 95 standard. However, you may not use preprocessing directives.
Save your program in a file called
in folders called
respectively for phases 1 and 2 of your project.
I will test your code in the same environment as the lab
machines in MI 302, using the commands
/usr/bin/gfortran proj.f95 -o proj ./proj
For phase 2 of your project, you will write a program to generate and
then execute a bottom-up parser. You should submit your program in a folder
proj2, and be sure that is runs as described above.
Your program will take as input the
specification of a CFSM as generated by
So that you don't have to spend all your time dealing with string manipulation
and I/O in Fortran, I have written a little Python script that transforms
the output from
bison -v to a file called
that will be a much friendlier for parsing. The Python script is
called fixoutput.py, and after downloading you
can run it by typing
python fixoutput.py something.outputwhere
something.outputis the result of running a command like
bison -v something.ypp.
For your project, you can assume that all the above has already happened
and you have a file called
spec.txt that contains the following,
NUM NAME, where
NUMis a non-negative integer and
NAMEis the name of the corresponding token. Actually, your program won't really need to know the names of the tokens since everything will be numeric, but this information might be helpful to you in debugging.
NUM NAME, where
NUMis a non-negative integer and
NAMEis now the name of a non-terminal in the grammar. Again, your program won't really need to know the names of the non-terminals, but this might help in debugging.
X s Y, where X and Y are numbers and s is literally the letter s, meaning that if the look-ahead is X, shift and go to state number Y.
X r Y, where X and Y are numbers and r is literally the letter r, meaning that if the look-ahead is X, reduce using grammar rule number Y.
a, where a is literally the letter a, meaning that if we get to this state in the CFSM, we should accept the string as being in the language and return.
An example of this file format is given below. Your program's job will be to read in this description of a CFSM and then actually perform a bottom-up parse according to that language specification.
The CFSM your program creates will simply be an array of states, indexed by their state numbers. Each state will itself be an array of actions to take. Each action might be yet another nested array. The details of this storage implementation are up to you. However, I suggest you store a list of symbol-action pairs, where each action is either a positive integer, meaning to shift and go to that numbered state, or a negative integer, meaning to reduce by that (negative) numbered rule. You will have to think of some special way of indicating the "accept" action.
After creating the CFSM, you should read a series of integers from standard in (which you may safely assume all correspond to tokens), and parse that string of tokens using the CFSM. This will involve maintaining a stack of symbol-state pairs as described in class, an indicator of the current (or next) state, and the saved value of the next "peeked at" token from the input. Your parser doesn't need to do any interpreting, but it should print out the tokens and non-terminals in the stack (as intgers) at every step in the process. These should be printed on a single line, separated by spaces, from the bottom of the stack to the top.
And of course
your program should
also identify and report any parse errors along the way. After reading
0, which will always correspond to
the EOF indicator, your program should halt. That is, you only have
to parse a single stream of tokens.
Your parsing algorithm will simply involve a loop that looks at the next state number and the look-ahead token, and then looks up the corresponding action to perform. You should either produce a parse error if there is no action specified for that kind of token, or else perform the specified shift or reduce action. For a shift, this means adding the next token to the top of the stack and updating the current state. For a reduce, this means looking at the specified grammar rule, popping a certain number of symbols off the stack, adding a certain non-terminal to be the next "peeked at" symbol on the input stream, and finally updating the current state to be the saved state from the top of the stack.
I suggest you write your program in the following steps. Of course you are free to develop however you wish. As always, you are encouraged to submit every time you get some small step of the program working.
spec.txtfile. Instead, start by making the global storage for the CFSM array, the stack, the next state, and the peeked-at input symbol.
spec.txtfile. I would suggest creating a
spec.txtCFSM description that exactly matches a manually-created CFSM that you have working. Then, piece by piece, remove the parts of your program that specify some part of the parser, and instead read that input from the
spec.txtfile. Stop often for testing!
.yppbison specification files that we have used in class and in labs can generate a
.outputfile which (using the Python script) can be turned into a
spec.txtfile that your program should be able to read. Examine the stack contents at each step to make sure the bottom-up parse is executing correctly.
You do not have to create your program in the exact way that I have suggested. However, for full credit your program should be well-documented and easy to follow. Every helper function you create should have documentation, and in general it should be easy for a non-Fortran programmer to see how your program works. You should also document how your data structures work, like the array of CFSM states.
The file spec.output is bison's output for a very simple grammar for adding and subtracting numbers. If this file is in the current directory, along with the Python program fixoutput.py, then running the command
python fixoutput.py spec.output
Will produce the following
0 $end 258 NUM 259 OPA 5 $accept 6 st 7 exp 0 5 6 0 1 6 6 7 2 6 7 3 7 7 259 258 4 7 258 0 258 s 1 6 s 2 7 s 3 1 258 r 4 259 r 4 0 r 4 2 0 s 4 258 s 1 7 s 5 3 259 s 6 258 r 2 0 r 2 4 a 5 259 s 6 258 r 1 0 r 1 6 258 s 7 7 258 r 3 259 r 3 0 r 3
This indicates the three tokens, three non-terminals, and 8 CFSM states of the parser. Then if I compile and run your program as above, then by typing the string
258 259 258 0
into standard in, your program should write
258 7 7 259 7 259 258 7 6 6 0
to standard out, and exit with status 0.
If I typed the string
258 259 259 0
instead, your program should write
258 7 7 259 Parse error!
and exit with nonzero exit code.