Try Firefox!

SI413 Lab 13: Memory Management

Part 1: Tracking Obj births and deaths
C++ extends C in many ways, including allowing object oriented programming. In particular, for any classs you define, you may define constructors and destructors. Constructors in C++ look pretty much the way they do in Java, except that you don't include a public/private declaration. Just like Java, you may have many constructors, all differing in the number and type of parameters. Destructors are the mirror image of constructors: they get called when an object is destroyed. If your class is named Foo, you define the destructor like this:
~Foo() { whatever-you-want-to-do }
We can use this, for example, to simply keep track of the number of Obj's created and destroyed in our interpreter. The first thing I want you to do is to modify your interpreter by adding global variables for counting the number of Obj's created and destroyed, adding constructors and destructors to the definition of Obj to increment the counters approproiately and, immediately before returning from main(), to print out the number of Obj's created and destroyed. Here's how you should do the first two of these:
// Structure for representing SPL objects by a type/value pair
int objccount = 0, objdcount = 0;
class Obj
{
public:
  int otype; union { int ival; bool bval; Closure *fval; };
  Obj() { ++objccount; }
  ~Obj() { ++objdcount; }
};
You might need to add something like this to the loop inside main, in order to get it to stop at the end of input:
if (tree == 0)
   break;
Run this code on the following example (press Ctrl-D after pasting it, to end the input):
new fib := lambda n { if (n < 2) { res := 1; } else { res := fib(n-1) + fib(n-2); } };
write fib(20);

Part 2: Fixing Part 1
What you ought to have noticed is that your Part 1 indicated that you destroyed more Obj's than were created. This should cause some consternation. What's going on is something that's specific to C++. Since C++ uses the value model for variables and has pass-by-value as its default for passing function parameters, calling a function with an Obj as an argument creates a copy of the Obj, and that copy is made by a special constructor called the copy constructor, not by the default constructor from our class definition. If you don't provide a copy constructor, the compiler provides one automatically, and its automatic copy constructor isn't going to increment objccount. So ... we need to define the copy constructor for ourselves. Here it is (obviously it needs to be added to the class definition):
Obj(const Obj &X) { ++objccount; *this = X; }
This constructor takes an Obj X and makes a copy of it. You better be able to answer the following question: Why does X need to be passed by reference rather than by value?

Now what happens if you run your program?

Part 3: Now do Frames and Closures all by yourself!
You really ought to be struck by how many objects are created that never get destroyed. Where are they all hiding?!?!?! If you try fib(30) it'll be in the millions. Why is this happening? To investigate, let's consider the two other classes we've created as part of our implementation of eval: Frame and Closure. I want you to instrument your interpreter so that it reports the number of Frame's and (separately) the number of Closure's created and destroyed as well as Obj's. Since we never pass Frames and Closures as function arguments, just Frame*'s and Closure*'s, you don't need to worry about copy constructors.

Run your following input:

new fib := lambda n { if (n < 2) { res := 1; } else { res := fib(n-1) + fib(n-2); } };
write fib(20);
... and see how many Obj's, Frames's and Closure's are created and destroyed. Create a file called part3.txt in your lab13 directory and copy and paste the interpreter session including creation/deletion results into the file.

Part 4: Garbage Collection
What do the results of the previous section tell us? They say that Frames and Closures never get destroyed. Since Frames are symbol tables containing Obj's, that accounts for the Obj's that don't get destroyed. Make sure you understand that: the "missing" objects are hiding in Frames, and our interpreter never destroys any frames. Why doesn't our interpreter destroy frames? The answer is simple: we don't know whether or not we still need them! Consider:
new f := lambda x { res := x*x; };
new g := lambda y { res := lambda z { res := y*y + z*z; }; };
write f(5);
25
new h := g(7);
# At this point the frame generated from calling f can be thrown away.
# The frame generated from calling g cannot, because we'll need it if
# h ever gets called.
It's similarly tricky to determine whether a Closure can be destroyed, and of course Closures refer to Frames so the problems are tied together.

What we need is garbage collection, an automatic method for determining which objects are still needed and which can be reclaimed. We are going to build a garbage collector for Frames and Closures. It's going to employ a technique known as "mark and sweep". After each top-level evaluation, i.e. when control returns to main, we'll call the garbage collector. It'll start with the global frame and loop through every Obj in the Frame, any Closures there get marked as "live", and the marking process continues recursively for every closures "env" Frame, as well as the parent frame. When the marking process completes, the sweep phase begins: all unmarked closures and Frames get destroyed (i.e. we call delete on a pointer to them).

To make this work, we need to

  1. add a boolean "mark" field to the Frame and Closure classes, and we must ensure that marks are initialized to false. You should know how to do this!
  2. add a vector<Frame*> containing pointers to all the Frames that have been created but not deleted. You should use constructors to make sure a pointer to each new Frame gets put in this vector (there is a "this" pointer in C++, just like Java, which you will need), and the garbage collector will worry about removing pointers from this vector when Frames get destroyed. Go ahead and make this vector a global variable. Isn't that liberating!
  3. add a vector<Closure*> containing pointers to all the Closures that have been created but not deleted. You should use constructors to make sure a pointer to each new Closure gets put in this vector (there is a "this" pointer in C++, just like Java, which you will need), and the garbage collector will worry about removing pointers from this vector when Closures get destroyed. Go ahead and make this vector a global variable as well. It's like drugs, it doesn't take much to get hooked.

You should be able to test whether or not you were successful by printing out the lengths of these two vectors at the end of main. They ought to agree with your count of the number of Frames and Closures created.

Part 5: Mark
Define a function void mark(Frame* env); that
  1. simply returns if the mark on the frame env points to is true, otherwise ...
  2. sets the mark on the frame env points to to true,
  3. calls mark recursively on the parent of the frame env points to (unless it's null), and
  4. loops through each Obj in the Frame env points to (see below) and for each Obj that's a Closure
    1. sets the mark for that closure to true, and
    2. calls mark recursively on the Frame that closure points to

So how do we "loops through each Obj in the Frame env points to"?

  for(map<string,Obj>::iterator itr = env->begin(); itr != env->end(); ++itr)
  {
  
	    in here "itr->second" is the Obj.
	    You see *itr is a pair<string,Obj>, i.e. a
	    string-Obj pair from the map. ".second" gives the second
	    element of that pair --- the Obj.
  }

Convince yourself that you set "mark" to true for every Frame and Closure that's still "live", meaning it might be accessed by subsequent computation. Please: THINK ABOUT IT!

Part 6: Sweep
Define a function void sweep(); that goes through the closure-vector calling delete on each pointer in the vector that points to an object whose mark is false and also removing that entry from the vector. (Hint: use some kind of swap, then pop_back to remove elements efficiently. But be very careful about how you handle iterating through a vector where elements are being removed -- what is the big challenge here?) Do the same for the frame-vector. Then, go through all the frames and closures that are still left in your two vectors and set the marks back to false. This is needed so that the next call to mark() has a clean slate to start with.

Finally: in main(), after you call eval call mark(global); sweep();. Test this on some examples to make sure that a) your interpreter still works, and b) that you're successfully reclaiming Closures, Frames, and Obj's. You can expect that when the program is over at least one frame (the global frame) is still around. Depending on what variables you have, you might have a few others left. Think about whether what you have is correct.

Limitations
There is a serious limitation to our implementation of mark & sweep - we can only do garbage collection outside of a call to eval. In the real world, this would be unacceptable, because a single long computation may require garbage collections before it is finished. What makes it difficult to implement garbarge collection during a call to eval is that the (potentially many recursive ) call(s) to eval my have local Obj, Frame and Closure objects that the garbage collector would have to take into account. With a virtual machine architecture (see upcoming labs) this difficulty doesn't arise, since there is no recursion and, essentially, no local Obj/Frame/Closures to worry about.

Submission
Submit via the usual submit script:
  1. your working program
  2. A README file that includes:
    • your names
    • a summary of the state of your code (either "it all works", or say what doesn't)
  3. the part3.txt file


Christopher W Brown
Last modified: Tue Nov 17 14:30:19 EST 2009