SI413: Closures and lexical scope with Frames

Reading

The Structure and Interpretation of Computer Programs, Section 3.2.

Homework

Do this HW.

Review

We reviewed what we did the previous lesson: semantics / implementations of various scoping regimes

single global scope / single global symbol table
dynamic scope / Central Reference Table (i.e. single global table mapping names to stacks of values
lexical scope / ??? - what we do here depends on what languages allow programmers to do with respect to functions
- local and global scope only / usual function call stack
- nested function definitions / add pointer to parent frame to the usual call stack model
- what else????

So that was where things left off. To make sure everyone was clear on this, I had you break up into groups and draw the picture for the following program at the exact point at which x + y + z is evaluated, and of course tell me the ultimate value returned and assigned to s in main.

int y = 10;

int g(int z) {
  int x = 5;
  int f(int x) { return x + y + z; }
  int h(int z) { return z + f(x + 1); }
  return h(3);
}

int main() {
  s = g(2);
}

Note: we introduced a new piece of terminology here, which is that the part of the call stack record that consists of bindings we called a "Frame".

Full first class functions ... why parent pointers aren't enough

Next we look at a slightly different version of the above program - one in which we have full first class functions. In particular, while the above example creates a "new" f every time function g is called, it does not return a dynamically created function. As we'll see, that last step changes the game. So, suppose we have the following slightly different (and in many ways simpler) program:

int y = 10;

func g(int z) {
  int x = 5;
  int f(int x) { return x + y + z; }
  return f;
}

int main() {
  s = g(2);
  t = s(6);
}

When we return from g the stack record for g's call is destroyed, which means the very frame we need as the parent when we make the call s(6) has disappeared. The binding we want for z is gone. This is why a call stack + parent pointers just isn't enough.

Blowing it up: Frames on the Heap and Closures

What we learned from the previous example is that parent pointers aren't enough for full first class functions. What we need to do is blow up the call stack record and move frames from the stack to the heap. This way a frame can continue to live after the function it came from returns. What we saw in class is that this too is not enough. What we saw is that the address of the lambda node is not enough to execute a function: we also need the address of the parent frame, because when we call a function, the frame we create for that function call will need to have its parent pointer set properly. This combination of function address and the address for the parent frame is called a closure

Revisiting our example - Why Frames aren't enough

Just to explain the above more clearly, let's consider why Frames alone aren't enough to implement first-class functions.

int y = 10;

func g(int z) {
  int x = 5;
  int f(int x) { return x + y + z; }
  return f;
}

int main() {
  s = g(2);
  t = s(6);
}

We call g(2). Note that the call gets its own frame. Its parent pointer is 0xF5, which is the global scope. This makes sense because although g is called from main, it is defined in the global scope, and lexical scope comes from where a function is defined.

g(2) has returned (so I crossed off the call stack record). The Frame for the call lives on, which is good. So at least the memory for z, which we will need later, lives on. However, when we go to call "s", which is main's name for the function f, we have a problem. We need to create a frame for the call to s, but nothing accessible to main, its frame or the global frame stores the address of the frame we should be using for s's parent pointer ... so we're stuck.

Our solution to this problem is that evaluating a lambda expression results in a new type of object called a closure. A closure consists of two pieces of information: the address of the lambda node defining the function we will be executing, and the the address of the frame we will be using as the parent pointer when we create the new frame defining the environment in which the function call will be executing. In the above example, the entry for s in main's Frame (i.e. 0xE7) would be

Labmda*	Frame*
0x9D	0xD4

... so that when s is called, we can create a new frame for s with the right parent pointer (0xD4) as well as knowing what lambda we're supposed to execute. Adding closures, at the point that s is called from main, we end up with the following picture: