(11:23:2007-10:14:2007)/2 + 10:14:2007... make sense. Try to extend/modify the grammar below to make this happen. Show me the modified grammar as well as a parse tree for the above expression using your modified grammar.
Colorless green ideas sleep furiously.... as an example of something that is syntactically correct, but without semantic value, i.e. meaningless even though it satisfies all our grammatical rules. It's a nice example to keep in mind, because it shows that semantics and syntax are really different.
In programming languages our expectations of what certain symbols should mean, like "+" or "if" may make us lose sight of the fact that both syntax and semantics really need to be defined to define a language. However, there are places where this becomes pretty clear. For example, in a C/C++ program the following is syntactically correct:
int x; x = 2^3;But what does it mean? I.e. what value does ther variable
x end up with? Not eight, in fact, but one!
You see, although we know it's syntactically correct, we don't
know its semantics. Meaning must be defined as well as form.
(BTW: "^" is bitwise xor in C/C++.) It's not just a question
of spelling out what each keyword or operator symbol means,
meaning of groups of symbols must also be defined. For
example, what about the following code fragment:
void foo(int x, int y, int z)
{
if (x < y < z)
In Java, this is actually syntactically incorrect. However,
in C/C++ it is correct. But what does it mean? What are the
semantics here? Just knowing what < means isn't enough to
tell you, actually. You have to know <'s associativity, and
you've got to know what type < returns ... and you even
have to know about C/C++ type automatic type promotion rules
are, and what true and false really are. And even then, it
certainly doesn't mean what a newbie would expect. Syntax
tells you whether or not x < y < z is
valid in the language - and this depends on the language, as
we've seen - but you need the semantics to know what the
expression actually means.
The book looks first at specifying syntax, then later at specifying semantics. In fact, however, the delineation is not that clear cut. We act as if, roughly speaking, a grammar specifies the syntax of a language, and something else specifies the semantics. However, the grammar for a language does a bit of both, but certainly not all of either. You know this already from Theory of Computing, where we talked about ambiguity in grammars and how it was undesirable in applications. For example:
exp -> exp op exp | NUM op -> + | - | * | /is a grammar that correctly defines the syntax of proper arithmetic expressions involving numbers, assuming
NUM is properly defined. But it's ambiguous, and
we went on to define a nice unambiguous grammar that met our
expectations for associativity and precedence of arithmetic
operators. But that's not neccessary if we're really only
interested in specifying syntax. The grammar above tells you
exactly which strings of symbols are valid arithmetic
expressions ... associativity and precedence are issues of
semantics, just as is the question of whether 5/2 gives 2.5 or
2.
In fact, it's not really possible in any practical sense to
completely separate syntax and semantics, because
the validity of an expression like
a < b < c may depend (as it does in Java)
on the meaning of the < operator, in terms of both types
and associativity.
None-the-less, it's useful to think of syntax and semantics
as separate concerns, as the book does. We just tend to lump
the semantic elements that are defined through things like
the language's grammar as being part of the language's
syntax.
stmt-list -> stmt+instead of
stmt-list -> stmt | stmt-list stmt
With an unambiguous grammar that's designed to accomodate the semantics of your language, a parse tree for a program is a data-structure that represents the program in a very useful way. Thus, parsing is a huge step in compiling or interpreting a program. However, a true parse tree may have many nodes and edges that are artifacts of how the grammar encodes properties like precedence or deal with symbols like ( )'s or ;'s that are actually unneccesary once the parse is complete. Thus we prefer an abstract syntax tree to a true parse tree, which is kind of like a simplified parse tree. Here's a simple example. Consider the following grammar, which defines basic arithmetic expressions, complete with associativity, precedence, unary as well as binary "-", and ( )'s.
exp -> exp OPA term | term
term -> term OPM factor | sfactor <-- NOTE: "sfactor" stands for "signed-factor", it handles unary "-"
sfactor -> OPA factor | factor
factor -> NUM | LP exp RP
where OPA = +|-, OPM = *|/, LP = (, RP = ), and NUM = digit+.
Compare first the true parse tree, then an abstract syntax
tree for the expression:
-(131 + 14*872) - 3*(-8 + 4/122):
Hopefully this illustrates the difference between the parse tree, and an abstract syntax tree. So, our goal is to use regular expressions and grammars to define as much as possible of the syntax of our programming language, and use those to produce code to read in program text and produce abstract syntax trees.