Menu

SI413 Class 12: Intro to Abstract Syntax Trees (Why we need to consider names first!)


Reading
Section 4.6 of Programming Language Pragmatics.

Homework
Get your lab 8 done!

Abstract syntax tree vs. parse tree
We looked at the parse tree for our simple programming language, and noticed that there are nodes that are unnecessary for semantic analysis. Consider the simple program:
{
  x := 4;
  y := 0;
  while ( x > 0) {
    y := y + x;
    x := x - 1;
  }
  write y;
}
Look at the parse tree for this program. Much of what's there is extraneous. We defined a simple abstact syntax tree structure for the language. Look at the abstract syntax tree for the program. It takes a bit of explanation. Nodes are labeled "foo:bar", which means that "bar" is the general class, and "foo" is a special kind of "bar". This, "asn:stmt" means an assignment statement, while "while:stmt" means a while statement. Statements have a final child: the next statement in the sequence. We can, in fact, write a grammar for all the valid trees (we're omitting read for now):
while:stmt -> exp stmt stmt NOTE that the final stmt is the next stmt in the sequence.
asn:stmt   -> id exp stmt
write:stmt -> exp stmt
if:stmt    -> exp stmt stmt
           -> exp stmt stmt stmt
binop:exp  -> exp exp
unop:exp   -> exp
id:exp     -> λ
num:exp    -> λ
bool:exp   -> λ

Starting to look at semantic checks
To do semantic checking, we need to actually make some decisions about the semantics of our language.
We want +,*,/,-,<,>,≤≥ to only apply to nums, not bools
We want and,or,not to only apply to bools, not nums.
We want to do static type checking.

We discussed using the abstract syntax tree to check these basic type rules. Essentially, we described a simple function ts(n) that analyzes a subtree of the abstract syntax tree rooted at n and returns one of these values:

Value of ts(n) for each possible node-type of n:

while:stmt -> exp stmt stmt       ts(n)=stmt if ts(exp)=bool and ts(stmt1)=ts(stmt2)=stmt, otherwise error
asn:stmt   -> id exp stmt         ts(n)=ts(exp) if ts(stmt)=stmt, otherwise error
write:stmt -> exp stmt            ts(n)=stmt if ts(exp)=bool or num and ts(stmt)=stmt, otherwise error
if:stmt    -> exp stmt stmt       ts(n)=stmt if ts(exp)=bool ts(stmt1)=ts(stmt2)=stmt, otherwise error
           -> exp stmt stmt stmt  ts(n)=stmt if ts(exp)=bool ts(stmt1)=ts(stmt2)=ts(stmt3)=stmt, otherwise error
binop:exp  -> exp exp             ts(n)=bool if op is <,>,≤,≥ and ts(exp1)=ts(exp2)=num
                                             OR op is and,or and ts(exp1)=ts(exp2)=bool
                                  ts(n)=num  if op is +,-,*,/ and ts(exp1)=ts(exp2)=num
                                  ts(n)=error otherwise
unop:exp   -> exp
id:exp     -> λ            ts(n)=???
num:exp    -> λ            ts(n)=num
bool:exp   -> λ            ts(n)=bool
So with this definition, we can check the tape-validity or an abstract syntax tree ... at least as long as we don't have variables. So what's the problem with variables? The decisions we made about the semantics of variables and names are going to have a serious impact on how or whether we could live with our decisions on the semantics of our simple language, and have a serious impact on what constitutes a "type-valid" abstract syntax tree. Thus, we need to learn about "names" in programming languages before we can continue. Fortunately, that's Chapter 3 in our book, which is where we'll start next.


Christopher W Brown
Last modified: Fri Oct 16 09:30:31 EDT 2009