Try Firefox!

SI413 Lab 8: Abstract syntax trees

This lab revolves around processing abstract syntax tree for a simple programming language called SPL. Our goal is to interpret SPL code. The precise syntax will be defined in the flex/bison files I distribute. SPL is a dynamically scoped language, and hopefully we'll see what that entails from the implementation perspective. Variables are declared in a funny way: you use "new" and you give the variable an initial value, e.g.
new x := 0;
Functions are created by a "lambda", and there are only unary functions. There is an implicit "new" variable res in every function body, and the value assigned to res when the function exits is the result of the function.
new sqr := lambda x { res := x*x; }
write sqr(5);
Part 1
Create a lab08 directory, and download and unpack ver0.tar.gz into that directory. Make the program go defined in ver0. (NOTE: you may need to add #include <algorithm> to the top of ast.cpp). Given a program as input, "go" displays an abstract syntax tree (AST) for that program. Here is the AST grammar for SPL:
asn:stmt -> id exp stmt
if:stmt -> exp block stmt
if:stmt -> exp block block stmt
while:stmt -> exp block stmt
read:stmt -> id stmt
write:stmt -> exp stmt
new:stmt -> id exp stmt

binop:exp -> exp exp
unop:exp -> exp
lambda:exp -> id block
funcall:exp -> id exp
id:exp -> λ
num:exp -> λ
bool:exp -> λ
For this first part of the lab, I just want you to get a feel for how the AST is represented by the Node data-structure in ast.h. Run the program "go" on the SPL code:
  new t := 0;
  new a := 0;
  new b := 0;
  read t;
  read a;
  read b;
  if (a > b)
    new t := a;
    a := b;
    b := t;
  write t * (b - a);
This code is in prog0.txt, so you could run it like this:
./go < prog0.txt
Now print out the parse tree that results. Annotate each node with the type of the node (e.g. asn, binop, if, ...) and with the value if applicable (e.g. >=, +, not, variable name, int value, etc.). Doing this will mean consulting ast.h, which you may also want to print out. (I recommend you use "enscript" to do this, e.g.
enscript -2rGh ast.h
). I want this stuff turned in!

Part 2
Now we'll consider our first bit of semantic analysis. New names are introduced into scope with "new" statements, and there are corresponding "new" nodes in the AST. When variables go out of scope is, unfortunately, implicit in the code's structure, not explicit, so there aren't any nodes in the AST to tell us when to pull variables out of scope. So let's add such nodes!

Download and unpack ver1.tar.gz into the lab08 directory. make the program "go" in that directory and run it on the same SPL program (prog0.txt) as above.

  1. Print it out and annotate the tree by pointing out which nodes denote variables going out of scope, and which variables they refer to. Refer to ver1/ast.h to see how this new kind of node is represented.
  2. Notice that variables go out of scope according to this tree in the reverse of the order in which they come in to scope. Take a look at ver1/ast.cpp and explain how the code makes that happen. What would you change if you wanted to have variables go out of scope in the same order as they were introduced with "new"?

This annotated paper, along with the answer to the above question, will be submitted.

Part 3
Let's start actually executing something! We're going to be modifying ver1. Add this piece of code directly before main() in lab8.ypp.
int eval(Node *root)
  switch(root->ntype) {
  case _write: 
    int x = eval(root->child[0]); 
    cout << x << endl;
    if (root->child.back() != NULL) eval(root->child.back());
    return 0;
  case _num:
    return root->ibVal;
    cerr << "Undefined nodetype " << root->ntype << endl;
... and call eval(tree) in main instead of displaying the parse tree. NOTE: there is a small bug in this code right now that will prevent it from compiling. Fix it. (Hint: you've been in a previous lab not to do this thing inside a switch statement. I left the bug here to force you to remember, because you'll run into this multiple times during this lab.

Right now the code only knows about write's and num's, so we can only interpret programs like:

{ write 5; write 7; write 9; }
Give it a try and verify that it works.

Your job is to extend this so that it works properly for all arithmetic expressions: no bools, no id's, no functions, just arithmetic.

Part 4
Extend your implementation to allow variables, new's, end-of-scope's, and variable assignments. We'll assume that all variables hold num values. The way that you'll do this is to have a map that maps strings (identifier names) to a vector of int's. Each "new" push_back's another value associated with that name, and each endscope should popback the vector associated with that name. An assignment should replace the current value with a new one. To do all this, you'll have a def like:
map< string, vector<int> > symTable;
Note that for vector V, V.back(); gives the last element of the vecotor and V.size() tells you the number of elements in the vector. If the vector associated with a name "foo" is empty but foo appears in an expression, that means an undeclared variable, and you should print an error message. Is this a lexical, parse or static-semantic or dynamic-semantic error?

Important tip #1: each statement node in the abstract syntax tree has as its final child either an explicit NULL pointer or a pointer to the next node in the sequence of statements comprising the current block. So before you "return" from eval'ing a statement node, remember to do:

if (root->child.back() != NULL) eval(root->child.back());
... to evaluate the next statement, if there is one.

Important tip #2: There are other ways, but if you happen to end up writing some code that looks like this:

vector<int>  myVector = symTable[variableName];
bear in mind that myVector will contain a copy of the vector for 'variableName' that is stored in the symbol table. So changing myVector won't directly change what is on the symbol table -- you'll have to copy myVector back onto the symbol table for that to happen, or else write it a different way. Or ask your instructor how to make a variable "by reference."

Part 5: Make sure it works!

Look ahead at the first part of next week's lab. Run the sample programs given there to make sure your code is functional enough.

Turn In:
Submit an electronic copy of your Part 4 solution using the submit script for the course, making sure that your solution directory contains a file README that has you and your partners names and alphas in it! Also turn in the papers with your Parts 1 and 2 solutions in class by the start of the next lab period.

Christopher W Brown
Last modified: Wed Dec 9 15:48:52 EST 2009
Luke K. McDowell