Lab 3: Closures and Tail Recursion

This is the archived website of SI 413 from the Fall 2013 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

This lab is due at 2359 on Thursday, 12 September. It should be submitted as "413 lab 03", with the files named properly according to the submit script. See the submit page for details. You are highly encouraged to submit whatever you have done by the end of this lab. Remember that duplicate submissions are fine; I will just grade the latest one when the time comes.

Variable number of arguments

Lots of Scheme functions like +, max, and even <, take a variable number of arguments. For example:

> (+ 1 2)
3
> (+ 1 2 3)
6
> (+ 3)
3
> (+)
0
> (> 3 2 1)
#t
> (> 3 1 2)
#f

How can we write our own functions to behave this way? Well, there are two ways. First, if in a lambda-expression, you put a variable name in place of the entire parameter list, your function gets as its argument that variable only, and it will be a list of all the actual arguments. Since a list can have any length, your function can take any number of arguments!

> (define avg (lambda vals (/ (apply + vals) (length vals))))
> (avg 95 80 86)
87
> (avg 13)
13
> (avg 85.0 62 77 84 78 92 90 58)
78.25

Some functions take a variable number of arguments, but must have at least some fixed number. For example, the - function must have at least one argument, as must max. In fact, the interpreter enforces this ... go ahead and try calling (max). Our avg should be defined this way, since calling it with no arguments would give a divide-by-zero error.

We can get a variable number of arguments another way that lets us specify that at least some arguments must be present. In the parameter list for the lambda-expression (or function defined in our usual way, without lambdas), specify the variables that must be there in the usual way, then put a . followed by just one more variable name. The name following the period will be a list of all the optional arguments, while those variables before the . are mandatory.

> (define (avg x . vals) (/ (+ x (apply + vals)) (+ 1 (length vals))))
> (avg 1.0 5 7)
4.333333333333333
> (avg 1 5)
3
> (avg 1)
1
> (avg) ;This one will give an error!

Exercises

  1. Write a function min-sin that takes one or more numbers as arguments and returns the one with the smallest sine, as returned by the built-in sin function.
    Note that the one of the original numbers should be returned, not its sine. For example, calling
    (min-sin 1.1 2.3 4.5 6.7)
    should return 4.5 since sin(4.5), which is about -0.98, is smaller than all the other sines.
  2. Write a function group whose first argument is an integer k that takes its remaining arguments and "groups" them such that each consecutive k arguments becomes a group. The function returns a list of groups ... where each group is itself a list. Careful: This is a bit tricky! I recommend that you write two helper functions, to extract the first n things from a list, and to remove the first n things from a list.
    > (group 2 'a 1 'b 2 'c 3)
    ((a 1) (b 2) (c 3))
    > (group 3 'a 1 'b 2 'c 3)
    ((a 1 b) (2 c 3))
    > (group 4 'a 1 'b 2 'c 3)
    ((a 1 b 2) (c 3))
    > (group 5)
    ()
    

Closures

In class, we saw that a key feature of functional programming languages like Scheme is that you can create a function "on the fly" (using lambda), and you can then write functions that create and return new functions.

So, for example, we can write a function that takes a number n and creates a new function that tests whether its argument is a list whose length equals n:

(define (length-tester n)
  (lambda (L)
    (= (length L) n)))
> (define len3? (length-tester 3))
> (len3? '(1 2))
#f
> (len3? '(1 2 3))
#t
> (define len5? (length-tester 5))
> (len5? '(1 2 3))
#f
> (len5? '(1 2 3 4 5))
#t

That's all well and good, but what happens if we now called the len3? function again? Would it test for length 5 or length 3?

Well, you can try it out and discover that (thankfully) len3? still works like it originally did - hooray! This took some kind of feat for the interpreter to pull off, since it basically has to save the value of n inside the lambda in the two different calls to length-tester.

The name for this thing - a function that remembers the local variables at the time of its creation - is a closure. More formally, a closure is a function plus its "referencing environment", i.e., the variables that were in scope when the function got created. In Scheme, whenever you return a function from a function, it actually returns a whole closure, so that calling that function "remembers" the local variables from wherever it got created. Very cool!

Exercises

  1. Write a function (power n) that returns a new function to raise any number to the power n. For example:

    > (define cube (power 3))
    > (cube 2)
    8
    > (cube 3)
    27
    > (cube (cube 2))
    512
    > ((power 5) 4)
    1024
    
  2. Remember those neat functions like cadr and cddadr? You may be dismayed to discover that these stop after 4 d's and a's, so for example caadaar is not a built-in function in Scheme.
    So let's write it! Define a function "factory" make-cXr that takes any number of 'a and 'd symbols as arguments, and returns another function to do that series of cars and cdrs.
    For example:
    > ((make-cXr 'a 'd 'd) '(1 2 3 4 5))
    3
    > (define caaaaaadr (make-cXr 'a 'a 'a 'a 'a 'a 'd))
    > (caaaaaadr '((1 2) ((((((3))))))))
    (3)
    

Object-oriented Scheme programming

The C++ view of object-oriented programming is "I've got this data, wouldn't it be nice to package it together with functions?" The Scheme view is "I've got functions, wouldn't it be nice to package them together with data?" Either way, you get to the same place. Closures, of course, is how you package data together with a function. Let's see how that might give you an "object". In particular, let's make an object that models a bank account, where you can make deposits and withdrawals.

First of all notice that a closure is a single function, so that the C++/Java model of an object with lots of member functions is not quite what we'll have. Instead, we will view ourselves as interacting with objects by sending messages. Thus, the function that is our object is a message-processing function. So instead of calling a member function withdraw, we will send the 'withdraw message. And instead of calling a member function deposit, we will send the 'deposit message. So the function that is our closure will take two arguments: the message, which will be 'deposit or 'withdraw, and the data associated with the message, which will simply be the amount.

(define bankAcct 
  (let ((bal 0)) 
    (lambda (message data)
      (cond ((eqv? message 'deposit) 
             (set! bal (+ bal data)) 
             bal)
            ((eqv? message 'withdraw) 
             (set! bal (- bal data)) 
             bal)))))

Oh yeah, I forgot to tell you: there is actually a way to modify variables in Scheme: the function set! The ! part is a convention that tells us this function is going to modify something. Basically, you call (set! x 10) to change the value of x to 10. Easy!

BUT with great power comes great responsibility! I didn't tell you about this before because usually, in Scheme, there is no reason to use set! and in fact doing so is "bad style". Making objects using closures is actually one of the only times that using set! is a good idea - so that's why I'm telling you now!

So anyway, the code above creates a new closure named bankAcct, which has as its only persistent data bal. Since bankAcct is a closure, it is a function. When you call it, it expects two arguments - message and data. Here's how you might use this object:

> (bankAcct 'deposit 10.50)
10.5
> (bankAcct 'withdraw 6.75)
3.75

Now, one serious limitation to this scheme is that we only have one bank account object. What if we want to model an entire bank, with thousands of accounts? We certainly don't want to copy and past the code (let ((bal 0) ...) thousands of times. Instead, let's write a function make-account that returns a new bank account object.

(define (make-account) 
  (let ((bal 0)) 
    (lambda (message data)
      (cond ((eqv? message 'deposit) 
             (set! bal (+ bal data)) 
             bal)
            ((eqv? message 'withdraw) 
             (set! bal (- bal data)) 
             bal)))))

Now we can use make-account to make as many bank account objects as we want.

> (define myacct (make-account))
> (define youracct (make-account))
> (myacct 'deposit 25.50)
25.5
> (myacct 'withdraw 14.23)
11.27
> (youracct 'deposit 20.00)
20.0

So where do we go from here? Well, it'd be nice to add a 'balance message that would simply get the balance for you ... of course that would cause problems since there wouldn't be any data parameter for that message. Thus, we'd need to rewrite this thing using one of the variable argument constructs. Of course some functions for signalling that the account should accrue one month's interest would be good too ... what about setting interest rates ...

Exercises

  1. Write a function make-stack that creates "stack objects". Your stack should allow the following messages:
    • 'size: Returns the number of elements in the stack.
    • 'pop: Removes the top element of the stack and returns it. (You can assume that the size of the stack is at least 1 before this method is called.)
    • 'push x1 x2 ...: Takes any number of additional objects and puts them on the stack. Note: normally "push" puts a single new object on the top of the stack. Your stack will allow an arbitrary number of arguments, and the effect will be to add them as if they had been pushed individually from right to left.

    Here's an example:

    > (define S1 (make-stack))
    > (S1 'size)
    0
    > (S1 'push 3 'a 2.1)
    > (S1 'pop)
    3
    > (S1 'push 'x)
    > (S1 'pop)
    x
    > (S1 'pop)
    a
    > (S1 'size)
    1
    

    I recommend using a list as the internal data structure, with the car of the list being the top of the stack.

  2. Some languages (such as Python) include sets as a basic built-in data type. Most others include sets in some kind of standard library that is available. We're going to build a set class for Scheme.
    Our sets will only contain numbers (at least for now), and will be stored as an ordered list. For this exercise, start by writing a make-set function which returns an "object" function which just supports basic getting and setting of the underlying storage. You will need this for the later functions, and for debugging while you write them.
    • 'get: Returns the underlying list that stores the set.
    • 'set! L: Sets the underlying list storage to L. (You don't have to check that L is sorted and doesn't contain any duplicates.) The return should be void.
    • 'size: Returns the number of distinct elements in the set.
  3. Extend your make-set to support two more messages. You will probably want to write helper functions for these.
    • 'insert! x1 x2 ...: Takes any number of numbers x1, x2, etc., and inserts each of them into the set, if they're not already there. The numbers x1... are not necessarily in sorted order, but they must be stored in sorted order within the set! The return should be void.
    • 'contains? x: Returns true or false depending on whether the set has the number x in it. (This one only takes a single argument.)
    Then you should be able to do:
    > (define S1 (make-set))
    > (S1 'get)
    ()
    > (S1 'size)
    0
    > (S1 'contains? 10)
    #f
    > (S1 'insert! 10)
    > (S1 'contains? 10)
    #t
    > (S1 'insert! 15 3)
    > (S1 'get)
    (3 10 15)
    > (S1 'contains? 11)
    #f
    > (S1 'contains? 15)
    #t
    > (S1 'set! (list 14))
    > (S1 'size)
    1
    > (S1 'contains? 15)
    #f
    > (define S2 (make-set))
    > (S2 'contains? 14)
    #f
    
  4. Extend your make-set to support one more message. 'intersect S returns a new set that contains all the elements that are in both this set and in S. You will definitely have to use the 'get and 'set messages, and write a helper function or two!
    I want you to take advantage of the fact that the set-storage lists are sorted in your implementation. You will essentially write something like the "merge" from MergeSort. Remember - no duplicates!
    After this, you should be able to do:
    > (define S1 (make-set))
    > (S1 'insert! 5 2 53 9 10)
    > (define S2 (make-set))
    > (S2 'insert! 6 7 2 10 11 83 5)
    > (define S3 (S1 'intersect S2))
    > (S3 'size)
    3
    > (S3 'contains? 6)
    #f
    > (S3 'contains? 10)
    #t
    
  5. (OPTIONAL. No tests for this one.) Add some more functionality, like union, set-difference, and subset?.

Tail Recursion

In class we looked at the way recursion gets implemented, and why Java or C++ programmers sometimes want to avoid it. It has to do with the stack space required for all the recursive calls.

For example, here is the range function from Lab 2:

; Returns a list containing integers a, a+1, ..., b.
(define (range a b)
  (if (> a b)
      null
      (cons a (range (+ a 1) b))))

If we followed the execution of something like (range 10 20), we would see that it goes all the way down to (range 21 20) before doing anything, and then all those function calls are "popped" off the stack one at a time as each cons is performed to get the answer.

Well what if we wrote range like this instead:

(define (range-helper a b accum)
  (if (> a b)
      accum
      (range-helper a
                    (- b 1)
                    (cons b accum))))
 
(define (range a b)
  (range-helper a b '()))

Yes, this is a little more verbose, but it's going to use a lot less memory. The reason is that range-helper is tail-recursive, meaning that all the recursive calls to range-helper don't have to be saved on the stack at once. The idea behind this function is that accum carries around the information that would normally have to be saved in the function call stack.

Definition: A function is tail recursive if its output expression in every recursive case is only the recursive call.

In Scheme, this means that the recursive call is outermost in every returned expression. By "outermost", I mean it's the last thing in a let or lambda body, or the second or third thing in a if statement, or the last thing in a cond clause, etc. If you want to be fancy about it, you say that the recursive call in these cases is in tail position.

In C/C++/Java, a tail-recursive function can be identified by looking at the return statement(s) for the recursive case(s): if the recursive call itself is the outermost expression in the return, the function is tail-recursive. The fundamental idea is that the result returned by the recursive call is the answer; nothing else needs to be done with it. In that case, there's no reason to pop records off the stack of recursive calls, and if you're never going to pop them off, they needn't be pushed in the first place.

The original range function above is not tail recursive because the output expression in the recursive case is (cons a (range (+ a 1) b)), which while it contains a recursive call is not just a recursive call. In contrast, the recursive case of range-helper has output expression (range-helper a (- b 1) (cons b accum)), which is just a recursive call.

As we noted above, with tail recursive functions we could actually forget about the stack and only ever remember what would ordinarily be the top record of the stack. Thus, tail recursive functions can be optimized to take less memory - and usually to run faster as well. Scheme interpreters are required to make this optimization whenever functions are defined tail-recursively.

Exercises

    (Since these are just about modifying functions from above, you don't need to write any new tests.)
  1. Modify your answer to two of exercises 1-4 to make your functions fully tail recursive. Indicate ;TAIL RECURSIVE in comments so I know which ones. OPTIONAL: do all three!
  2. Modify your implementation of 'insert! or 'intersect from exercises 7 and 8 to make the function tail recursive. Indicate ;TAIL RECURSIVE in comments so I know which one. OPTIONAL: do all of your set functions!