SI413: Scheme - list internals, tail-recursion, closures

Closures

We're going to give ourselves something interesting to compute, just to provide a starting point for conversation: we're going to compute "harmonic numbers". The nth harmonic number is defined as $h_n = \frac{1}{1} +\frac{1}{2} +\frac{1}{3} + \cdots + \frac{1}{n-1} + \frac{1}{n}$. Here's a nice scheme function:

(define (H n) (if (= n 0) 0.0 (+ (/ 1.0 n) (H (- n 1)))))

The real conversation is going to start with using this function. suppose I want to map $(x - h_{100})^2$ across some list of values - for example the integers from one to 10. I might do something like this:

(map (lambda (x) (let ((h (H 100))) (expt (- x h) 2))) '(1 2 3 4 5 6 7 8 9 10))

This certainly does the job, but there's a hidden inefficiency here: (H 100) gets recomputed for each of the 10 elements of the list! That makes this about 10x slower than it should be. If you crank 100 up to 500000, you'll start to really see things slow down. What's the solution? Let's put the let outside the lambda, rather than inside. This way h is no longer a local variable inside the function created by lambda, and (H 100) will only get computed once --- prior to the lambda expression being evaluated. In other words, we'll do this:

(map (let ((h (H 100))) (lambda (x) (expt (- x h) 2))) '(1 2 3 4 5 6 7 8 9 10))

Sure enough, this is 10x faster (try on (H 500000) to see this is true). Obviously, there's an important lesson here about structuring scheme code to improve performance. But there's a more interesting lesson to be learned by looking closely at the improved function definition. If we draw out the function calls involved in evaluating this expression, wee see a call to "let", which creates local variable h, then a call to lambda (call stack is "let" then "lambda"). The function created by the call to lambda has a reference to h, which is not local to that function. It is a local variable in the call one record lower on the call stack! The lambda call exits, returning the newly created function, and then the let call exits, passing along that newly created function as its return value. So at this point we have a function, but that function refers to h, which by all the rules of the call stack we know from other languages, should have disappeared when the "let" returned. Obviously, scheme works differently, and this h is captured and protected. This thing we now have, this "function + non-local variables" is called a closure. It's kind of a big deal. We discussed other languages (e.g. Java) and how they struggle a bit to deal cleanly with this situation, precisely because it breaks their model of how the call stack works. In Java, we see this in "inner classes" - classes defined in functions. If these classes refer to non-local variables, Java requires that those variables are final. This little bit of pickyness is necessary, since it allows Java to treat and simply copy the values of the non-local variables into the inner class. After all, "final" means those values can't change.

Tail-recursion

In class, we looked at computing (H 1000000) from above, and noticed that we actually ran out of memory (in addition to being slow). What's up with that? Where's the extra memory coming from? It's coming from the call stack, which actually consists of 1,000,000 million records in this case before we hit the base case and start popping stack records. This is pretty bad! Note that the obvious C/C++/Java/Python version computes $h_{1,000,000}$ in no time.

Given n
-------
acc = 0	
while n != 0      
  acc += 1/n
  n--
return acc

This is quite fast, and very clearly it needs only constant memory. If you think about it, the recursive version is more or less like making an array of size 1,000,000, copying in the receiprocals of 1,2,...,n, and then adding them all up. Our loop-based program above works differently: the only extra storage it uses is the accumulator variable acc. We could rewrite the recursive scheme code to follow the same strategy:

(define(H n acc) (if (= n 0) acc (H (- n 1) (+ acc (/ 1.0 n)))))

... it just requires a new parameter acc for the accumulated value. This version is really fast, appearing to use only constant additional memory, as indeed is the case. This version is tail recursive, which allows the scheme interpreter to execute it without having to grow the stack.

Consider the following Scheme function for computing the sum of the square roots of the integers from 1 to n.

(define (ssum n) (if (= n 1) 1 (+ (sqrt n) (ssum (- n 1)))))

It works nicely and all, but it really takes some time to compute something like (ssum 1000000). In fact it pretty much bogs the interpreter down. It bogs the whole machine down. Is this necessary? Why is it happening? Well, it turns out that the problem is that this program takes up a lot of memory. But wait a second? Why is that? Looking at the program it isn't clear how it could take more than constant memory since there are no lists. However, this program is recursive, and each recursive call results in a new function record being pushed onto the call stack. This takes up a lot of space when there are 1,000,000 recursive calls, like there are in (ssum 1000000). This is the dark side of functional programming. There's a hidden penalty for all the convenience (yes, I said "convenience") of programming everything recursively, and that's the memory overhead.

However, there are certain recursive functions that don't require this memory overhead. For example, consider the following alternate definition of ssum:

(define (ssum-tr n e) (if (= n 0) e (ssum-tr (- n 1) (+ (sqrt n) e))))

Note: The second paramter e must be zero in the initial call. The idea behind this function is that e carries arround the "extra" stuff you've already computed. Suppose you were computing

	√1 + √2 + √3 + √4

i.e. (ssum-tr 4 0). In computing this with ssum-tr you'd end up at some point with a recursive call (ssum-tr 2 3.73) which would indicate that the answer is ssum(4) = ssum(2) + 3.73, i.e.

ssum(2) = √1 + √2 + √3 + √4 = 6.15
          \_____/   \_____/
        =  ssum(2) + 3.732  = 2.414 + 3.732 = 6.15

It's a different way of arriving at the same result. However, lets watch the call stack evolve during ssum-tr's computation of (ssum-tr 4 0):

                                        n=0,e=6.15 ---> Answer!
                             n=1,e=5.15 n=1,e=5.15
                  n=2,e=3.73 n=2,e=3.73 n=2,e=3.73
         n=3,e=2  n=3,e=2    n=3,e=2    n=3,e=2   
n=4,e=0  n=4,e=0  n=4,e=0    n=4,e=0    n=4,e=0   
-------  -------  -------    -------    -------   
 Stack    Stack    Stack      Stack      Stack

Hopefully what you see is that there's no reason to remember anything beyond the top record on the stack. We have the answer without ever having to do a pop, and if we don't need to do any pops, we didn't need to remember the stuff in the first place. A function like this is called tail recursive.

Definition: A function is tail recursive if its output expression in every recursive case is only the recursive call.

In Scheme, this means that the recursive call is outermost. In C/C++/Java, a tail-recursive function can be identified by looking at the return statement(s) for the recursive case(s): if the recursive call itself is the outermost expression in the return, the function is tail-recursive. The fundamental idea is that the result returned by the recursive call is the answer; nothing else needs to be done with it. In that case, there's no reason to pop records off the stack of recursive calls, and if you're never going to pop them off, they needn't be pushed in the first place.

The function (define (ssum n) (if (= n 1) 1 (+ (sqrt n) (ssum (- n 1))))) is not tail recursive because the output expression in the recursive case is (+ (sqrt n) (ssum (- n 1))), which while it contains a recursive call is not just a recursive call. In constrast, the recursive case of ssum-tr has output expression (ssum-tr (- n 1) (+ (sqrt n) e)) which is just a recursive call.

As we noted above, with tail recursive functions we could actually forget about the stack and only ever remember what would ordinarily be the top record of the stack. Thus, tail recursive functions can be optimized to take less memory - and usually to run faster as well. Scheme interpreters are required to make this optimization whenever functions are defined tail-recursively. Our tail recursive ssum-tr has no trouble computing (ssum-tr 1000000 0) .

Limitations of tail recursion

Note on tail recursion: Someone asked if there are functions that can't be defined tail recursively. The answer is yes ... ish. You can always simulate the stack in data, which allows you to always be tail recursive ... but data grows. However, if you restrict what's allowed, you can write functions that can't be done with tail recursion. Example: Write a function (pal? L) that determines whether a list L is a palindrome, with the restriction that you can't create new lists (no new nodes!) and you can't do arithmetic (e.g. no counters). [Note: workarounds like simulating lists with lambda's aren't allowed either!] The rationale is that we don't want you using more than a constant amount of memory, without regard to how long the list is. Arithmetic is disallowed because integers actually grow - at least in principle. If your counter was guaranteed to never exceed some small threshold (say 256), it would be allowable. Here's a solution:

HIDDEN TO CHALLENGE YOU TO COME UP WITH YOUR OWN SOLUTION

 > (pal? '(2 3 2))
#t
> (pal? '(3 9 0 0 9 3))
#t
> (pal? '(3 9 0 1 9 3))
#f

No such solution can be tail-recursive. It cannot be tail recursive and stay within the rules, because deciding palindromes is a problem we know to be impossible with a fixed finite amount of memory. So if you don't get extra memory from the call stack, it would have to come from somewhere else ... and the rules prohibit other sources.

List internals and why garbage collection can be tricky

We talked about the internals of list representation in scheme - what "cons" does, for example. This immediately led to discussions of garbage collection and how in Scheme, everything's a pointer.

In a nutshell, consider the following:

(define L1 '(1 2 3))
(define L2 (cons 'a L1))
(define L3 (append '(8 9) L2))

The second line creates a new list (a 1 2 3) and gives it the name L2. It hasn't modified L1 in any way, so it's not like pushing a new element on top of the stack which, of course, modifies the stack. Since we have two distinct lists (1 2 3) and (a 1 2 3), it would seem that the interpreter would've created 7 list nodes. However, recall that we have referential transparency, i.e. nothing is ever modified. Therefore, there's no harm done if the linked lists that store L2 and L1 share nodes. Same goes for L3. In fact, internally, these three lists with a total of 13 data values are represented internally by 6 nodes:

This means that (cdr L) and (cons x L) are constant time operations (make sure you see why), i.e. they take the same amount of time regardless of the size of L. How about (append L1 L2)? How does its running time depend on the length of L1 and the length of L2? Think carefully about this and make sure you know the answer!

This sharing of nodes has an important consequence: When the name L1 goes out of scope, the interpreter can't necessarily reclaim its nodes. Indeed, in this example, if L2 or L3 are still in scope, none of L1's nodes can be reclaimed. What if L3 goes out of scope? Then the first two nodes can be reclaimed, but not the rest. In general, when this sharing of nodes occurs it becomes very difficult to determine which nodes can be reclaimed. It's completely impractical to expect the programmer to do it, and really undesireable too, since Scheme programmers are supposed to think of lists abstractly, and not bother with the details of how the interpreter implements them. Thus, Scheme really requires automatic garbage collection, meaning that the interpreter has to figure out which nodes can be reclaimed.

Reading

Homework

Closures

Tail-recursion

Limitations of tail recursion

List internals and why garbage collection can be tricky