SI413: Expressions and assignments

Where are we?

Our labs recently have been on dynamic type checking, garbage collection and starting compiling to the Intermediate Representation LLVM. All good stuff. But the last topic that we really spent a lot of lecture time on was scoping. Big topic, obviously, but one that we actually finished with a while back. So, in essence, we've gotten to the point at which we are defining variables (binding) and even initializing variables. Now we are going to turn to assignment. Doesn't seem like much to say, but there's more than you might think! To assign, we need a value to assign. These are typically the result of evaluating an expression. So we will review expressions,

Operators vs. functions

What's the difference between an operator and a function? We have this idea that "+" is an operator, whereas "sin" is a function, for example. But the difference is language specific ... or even implementation specific. In scheme, everything is a function, pretty much, even "+". So for all practical purposes, Scheme has no operators. In our SPL implementation, there's a huge difference. Each function call results in a frame being created, whereas there's no frame created when an operator is evaluated. In Ada and C++, operators are often just semantic sugar for function calls, i.e. A + B may be literally translated as operator+(A,B), or A.operator+(B).

Prefix, postfix, infix, mixfix

Total review:

a prefix operator has the operator before the operands, like ! in C/C++/Java ... if (!t) ...
a postfix operator has the operator after the operands, like i++.
an infix operator is a binary operator for which the operator comes between the operand, like 3 + 4.
When an operator has arity greater than two and it is neither prefix nor postfix, it is usually called "mixedfix", like a ? b : c.
Postscript is a language in which all expressions are postfix. It's a graphics language, and to draw a line from the "current position" to (300,400), you'd write "300 400 lineto".
Scheme, of course, is all prefix. It's a special kind (what with the parens and all) called "Cambridge Polish" notation.

Associativity vs. precedence

We went over the difference between associativity and precedence ... again. You better know it for the final! Languages differ in the number of precedence levels they define. Compare the C++ operator precedence table to that of pascal:

	  PRECEDENCE     OPERATORS
 
highest        NOT
   .           *, /, DIV, MOD, AND
   .           +, -, OR
lowest         <, <=, <, >, =, >=, >, IN

Now, not only does C++ have many more operators, but it defines many more precedence levels. Too many precedence levels has the potential drawback that program behavior becomes difficult to predict without constantly referring to the precedence table. Too few levels can cause unexpected behavior. For example , in Pascal

x > y OR z > 0 is equivalent to x > (y OR z) > 0

which is counterintuitive to say the least!

Assignment

Assignment is one of those things that's deeper than one might think. I'm not talking variable declaration, initialization or binding. That's stuff we already covered. I'm talking about assigning a new value to an existing variable.

Issue: is assignment an expression or a statment? I.e. will something like 3 + (x=5) be allowed, and what will it mean? Java and C/C++ allow this, which shows that assignements are expressions in those languages. In the above, x=5 has the side effect of assigning 5 to x, and the result of the expression is simply x after the assignement. So the type and value come directly from x. In SPL, we've defined assignement to be a statement, so something like write x := 5; is illegal. (Is this a lexical error, parse error or an error in the semantic analysis phase?) Python is similar. Assignment is a statement.

$ python
>>> x = 1
>>> print(x)
1
>>> print(x=2)
  File "<stdin>", line 1
    print(x=2)
           ^
SyntaxError: invalid syntax

In scheme, set! returns a void value, so you can't really compute meaningfully with the return value of a set!. Thus, in essence, assignment is not an expression.

Value vs. reference model

Considering assignment brings us face to face with a fundamental language issue: What is the nature of a variable? Consider:

x = x + 1;

is OK, but most likely

x + 1 = x;

is not. Why? Why is x something that can be assigned a value but not x + 1? Why it doesn't work depends on your model of variables. The value model thinks of variables as locations that can hold values, and one assigns values to locations. So x is a location, but x + 1 is just a value. The object on the left-hand-side of the assignment must be what's called an l-value, a location. Objects that can appear on the right-hand-side, i.e. "objects" or "values" are r-values. A variable like x can play both roles. In "x = x + 1", the x on the left stands for the location, the x on the right stands for the value stored at that location. The other model is the reference model. It views variables as being references (pointers) to actual objects. In this model, the problem with x + 1 = x is that x + 1 is not a reference. In the reference model, l-values are locations that can hold references. Python is an example of a language that follows the reference model.

We'll see that, at least for some basic types, the reference model comes with some perhaps unexpected consequences. On the other hand, for large objects of user-defined type, the reference model has advantages. Specifically, assignment is always constant time with the reference model, but with the value model we have, potentially, lots of data to copy. So assignment could take a long time, and of course you need the memory to store a whole second object. So languages like C, C++, Pascal, etc. that follow the value model introduce special types (pointers!) that can act as references.

What are l-values

Obviously variables are l-values. It's tempting to say that other expressions aren't, but not true. The following is valid C/C++:


int x, y, z;
cin >> x;
(x > 0 ? y : z) = 10;

... which tells us that the ?: operator yields (at least sometimes) l-values. C++ even lets you "return by reference", so that the results of function calls can be l-values.

int foo(int &m, int &n) { return (n > m ? n : m); }
int& bar(int &m, int &n) { return (n > m ? n : m); }
 ...
foo(x,y) = 5; // Error, foo(x,y) not an l-value
bar(x,y) = 5; // OK

Much less exotically, though, how about X[2*i+1] = 5;? This should work, so clearly some expressions yield l-values. Suppose you have a struct Point in C, with fields x and y of type double. If p is a point, then p.x = 2.5; is perfectly valid. So fields in structs are l-values.

Java

Java is an interesting case. In Java, types int, float, double, char all follow the value model, but everything else (arrays and everything derived from Object) follows the reference model. This leads to some oddities - for example, each of the basic types, like int, that follows the value model is shadowed by an object type (i.e. class derived from Object) that follows the reference model. For int, the object version is Integer.

The reason for this is that for basic types, the value model is more efficient. If x and y are ints, computing x+y in the value model requires fetching the value of x, fetching the value of y, and performing the addition. In the reference model, computing x+y requires fetching the address associated with reference x, then fetching the value from that address, doing the same two steps for y, and then adding. The extra round of memory accesses is costly - especially because it can be harder to ensure data locality, which means we lose out on some of the benefits of cache.

Immutable objects

Often the refernce model behaves as one expects/wants. But not so much with basic types like numbers, characters and strings. For example:

int a,b,c;
a := 5;
b := 2;
c := a;
a++;

Under the value model c is still 5. Under the referenced model, if ++ really modifies the object a refers to, c now refers to 6. This is a bit counter to how we think of such things usually. With strings you have the same problem:

String a, b, c;
a := "hello";
b := "goat";
c := a;
a[0] := 'j';

Now c refers to "jello". To avoid this in these fundamental types, systems with the reference model may make these types "immutable", i.e. unchangeable. This, ++ wouldn't modify the int a points to in the first example, it would create a new int and set a to point to it. The operation in the second example would be illegal. Functions for doing this kind of operation would actually return new strings. Since Java has the value model for int's, this doesn't come into play, but strings in Java follow the reference model, and they are indeed "immutable". Check out Why String is Immutable in Java? for a discussion of why Java chose to make string immutable.

Clone

More complex objects in Java follow the reference model and are not immutable. So after

BankAcct a = new BankAccount();
c = a;
a.setBal(100);

c's bank account is 100, because c and a refer to the same thing. Sometimes you really want copies, though, and Java has the clonable interface to give a standard way of getting duplicate objects. If BankAcct implements clonable, we can do this:

BankAcct a = new BankAccount();
c = a.clone();
a.setBal(100);

... and c's balance will still be zero ... or whatever the default is. Cloning is a commen operation when languages follow the reference model.

Reading

Homework