(define (H n) (if (= n 0) 0.0 (+ (/ 1.0 n) (H (- n 1)))))The real conversation is going to start with using this function. suppose I want to map $(x - h_{100})^2$ across some list of values - for example the integers from one to 10. I might do something like this:
(map (lambda (x) (let ((h (H 100))) (expt (- x h) 2))) '(1 2 3 4 5 6 7 8 9 10))This certainly does the job, but there's a hidden inefficiency here:
(H 100) gets recomputed for each of the 10
elements of the list! That makes this about 10x slower than it
should be. If you crank 100 up to 500000, you'll start to
really see things slow down. What's the solution? Let's put
the let outside the lambda, rather than inside. This
way h is no longer a local variable inside the
function created by lambda, and (H 100) will only
get computed once --- prior to the lambda expression being
evaluated. In other words, we'll do this:
(map (let ((h (H 100))) (lambda (x) (expt (- x h) 2))) '(1 2 3 4 5 6 7 8 9 10))Sure enough, this is 10x faster (try on (H 500000) to see this is true). Obviously, there's an important lesson here about structuring scheme code to improve performance. But there's a more interesting lesson to be learned by looking closely at the improved function definition. If we draw out the function calls involved in evaluating this expression, wee see a call to "let", which creates local variable h, then a call to lambda (call stack is "let" then "lambda"). The function created by the call to lambda has a reference to h, which is not local to that function. It is a local variable in the call one record lower on the call stack! The lambda call exits, returning the newly created function, and then the let call exits, passing along that newly created function as its return value. So at this point we have a function, but that function refers to h, which by all the rules of the call stack we know from other languages, should have disappeared when the "let" returned. Obviously, scheme works differently, and this h is captured and protected. This thing we now have, this "function + non-local variables" is called a closure. It's kind of a big deal. We discussed other languages (e.g. Java) and how they struggle a bit to deal cleanly with this situation, precisely because it breaks their model of how the call stack works. In Java, we see this in "inner classes" - classes defined in functions. If these classes refer to non-local variables, Java requires that those variables are
final. This little bit of pickyness is
necessary, since it allows Java to treat and simply copy the
values of the non-local variables into the inner class. After
all, "final" means those values can't change.
(H 1000000) from
above, and noticed that we actually ran out of memory (in
addition to being slow). What's up with that? Where's the
extra memory coming from? It's coming from the call stack, which
actually consists of 1,000,000 million records in this case
before we hit the base case and start popping stack records.
This is pretty bad! Note that the obvious C/C++/Java/Python
version computes $h_{1,000,000}$ in no time.
Given n
-------
acc = 0
while n != 0
acc += 1/n
n--
return acc
This is quite fast, and very clearly it needs only constant
memory. If you think about it, the recursive version is more or
less like making an array of size 1,000,000, copying in the
receiprocals of 1,2,...,n, and then adding them all up. Our
loop-based program above works differently: the only extra
storage it uses is the accumulator variable acc.
We could rewrite the recursive scheme code to follow the same
strategy:
(define(H n acc) (if (= n 0) acc (H (- n 1) (+ acc (/ 1.0 n)))))... it just requires a new parameter
acc for the
accumulated value. This version is really fast, appearing to
use only constant additional memory, as indeed is the case. This
version is tail recursive, which allows the scheme
interpreter to execute it without having to grow the stack.
Consider the following Scheme function for computing the sum of the square roots of the integers from 1 to n.
(define (ssum n) (if (= n 1) 1 (+ (sqrt n) (ssum (- n 1)))))It works nicely and all, but it really takes some time to compute something like (ssum 1000000). In fact it pretty much bogs the interpreter down. It bogs the whole machine down. Is this necessary? Why is it happening? Well, it turns out that the problem is that this program takes up a lot of memory. But wait a second? Why is that? Looking at the program it isn't clear how it could take more than constant memory since there are no lists. However, this program is recursive, and each recursive call results in a new function record being pushed onto the call stack. This takes up a lot of space when there are 1,000,000 recursive calls, like there are in (ssum 1000000). This is the dark side of functional programming. There's a hidden penalty for all the convenience (yes, I said "convenience") of programming everything recursively, and that's the memory overhead.
However, there are certain recursive functions that don't require this memory overhead. For example, consider the following alternate definition of ssum:
(define (ssum-tr n e) (if (= n 0) e (ssum-tr (- n 1) (+ (sqrt n) e))))Note: The second paramter e must be zero in the initial call. The idea behind this function is that e carries arround the "extra" stuff you've already computed. Suppose you were computing
√1 + √2 + √3 + √4i.e. (ssum-tr 4 0). In computing this with ssum-tr you'd end up at some point with a recursive call (ssum-tr 2 3.73) which would indicate that the answer is ssum(4) = ssum(2) + 3.73, i.e.
ssum(2) = √1 + √2 + √3 + √4 = 6.15
\_____/ \_____/
= ssum(2) + 3.732 = 2.414 + 3.732 = 6.15
It's a different way of arriving at the same result.
However, lets watch the call stack evolve during
ssum-tr's computation of (ssum-tr 4 0):
n=0,e=6.15 ---> Answer!
n=1,e=5.15 n=1,e=5.15
n=2,e=3.73 n=2,e=3.73 n=2,e=3.73
n=3,e=2 n=3,e=2 n=3,e=2 n=3,e=2
n=4,e=0 n=4,e=0 n=4,e=0 n=4,e=0 n=4,e=0
------- ------- ------- ------- -------
Stack Stack Stack Stack Stack
Hopefully what you see is that there's no reason to remember
anything beyond the top record on the stack. We have the
answer without ever having to do a pop, and if we don't need
to do any pops, we didn't need to remember the stuff in the
first place. A function like this is called tail recursive.
Definition: A function is tail recursive if its output expression in every recursive case is only the recursive call.In Scheme, this means that the recursive call is outermost. In C/C++/Java, a tail-recursive function can be identified by looking at the
return statement(s) for the
recursive case(s): if the recursive call itself is the
outermost expression in the return, the function
is tail-recursive. The fundamental idea is that the result
returned by the recursive call is the answer; nothing else
needs to be done with it. In that case, there's no reason to
pop records off the stack of recursive calls, and if you're
never going to pop them off, they needn't be pushed in the
first place.
The function (define (ssum n) (if (= n 1) 1 (+ (sqrt n) (ssum (- n 1)))))
is not tail recursive because the output expression in the recursive
case is (+ (sqrt n) (ssum (- n 1))), which while
it contains a recursive call is not just a
recursive call. In constrast, the recursive case of ssum-tr
has output expression
(ssum-tr (- n 1) (+ (sqrt n) e)) which
is just a recursive call.
As we noted above, with tail recursive functions we could
actually forget about the stack and only ever remember what
would ordinarily be the top record of the stack. Thus, tail
recursive functions can be optimized to take less memory -
and usually to run faster as well. Scheme interpreters are
required to make this optimization whenever
functions are defined tail-recursively.
Our tail recursive ssum-tr has no trouble computing
(ssum-tr 1000000 0) .
(pal? L) that determines whether a list L is a
palindrome, with the restriction that you can't create new lists
(no new nodes!) and you can't do arithmetic (e.g. no counters).
[Note: workarounds like simulating lists with lambda's
aren't allowed either!]
The rationale is that we don't want you using more than a constant
amount of memory, without regard to how long the list is.
Arithmetic is disallowed because integers actually grow - at least
in principle. If your counter was guaranteed to never exceed some
small threshold (say 256), it would be allowable. Here's a solution:
HIDDEN TO CHALLENGE YOU TO COME UP WITH YOUR OWN SOLUTION
> (pal? '(2 3 2)) #t > (pal? '(3 9 0 0 9 3)) #t > (pal? '(3 9 0 1 9 3)) #fNo such solution can be tail-recursive. It cannot be tail recursive and stay within the rules, because deciding palindromes is a problem we know to be impossible with a fixed finite amount of memory. So if you don't get extra memory from the call stack, it would have to come from somewhere else ... and the rules prohibit other sources.
In a nutshell, consider the following:
(define L1 '(1 2 3)) (define L2 (cons 'a L1)) (define L3 (append '(8 9) L2))The second line creates a new list
(a 1 2 3) and
gives it the name L2. It hasn't modified
L1 in any way, so it's not like pushing a new
element on top of the stack which, of course, modifies the stack.
Since we have two distinct lists (1 2 3) and
(a 1 2 3), it would seem that the interpreter
would've created 7 list nodes. However, recall that we have
referential transparency, i.e. nothing is ever modified.
Therefore, there's no harm done if the linked lists that
store L2 and L1 share nodes. Same goes for L3.
In fact, internally, these three lists with a total of 13 data
values are represented internally by 6 nodes:
This means that (cdr L) and (cons x L) are
constant time operations (make sure you see why), i.e. they
take the same amount of time regardless of the size of
L. How about (append L1 L2)?
How does its running time depend on the length of
L1 and the length of L2? Think
carefully about this and make sure you know the answer!
This sharing of nodes has an important consequence: When
the name L1 goes out of scope, the interpreter
can't necessarily reclaim its nodes. Indeed, in this
example, if L2 or L3 are still in
scope, none of L1's nodes can be reclaimed.
What if L3 goes out of scope? Then the first
two nodes can be reclaimed, but not the rest. In general,
when this sharing of nodes occurs it becomes very difficult
to determine which nodes can be reclaimed. It's completely
impractical to expect the programmer to do it, and really
undesireable too, since Scheme programmers are supposed to
think of lists abstractly, and not bother with the details
of how the interpreter implements them. Thus, Scheme really
requires automatic garbage collection, meaning that the
interpreter has to figure out which nodes can be reclaimed.