Chapter 6: Trees

What is the difference between an Abstract Data Type and a Data Structure?
1. An ADT is a mathematical model of information stored in a computer.
2. A Data Structure is a particular arrangement of data in a computer.
3. An ADT consists of specification of behavior by providing a list of operations that can be performed on it. It does not say how this gets done in the computer.
4. A data structure describes how an ADT gets implemented.
5. In java an Interface is like an ADT- says what needs to be done, with no instructions of how.
6. When you implement the interface, you write the data structure.
7. Which of the following is an ADT/DS:
  1. Linked List
  2. Stack
  3. Array
  4. Queue
  5. List
Our first ADT in this class will be the Tree.
When we say the word "tree" in Computer Science, what we mean is a tree like a family tree, or a military command structure. We can draw one like: (insert academy hierarchy)
What trees represent is hierarchy. This is different from lists in that lists represent linear structure.
Other things organized as trees: folders/directories, book chapters, decision trees.
In a tree, everything must be connected up to a single root.
Definitions:
1. Node. element of the tree.
2. Root. Node at the top.
3. Parent. Of 2 nodes, A and B, A is the parent of B is A is directly above B. Nodes can each have only one parent.
4. Child. Reverse of parent.
5. Sibling. 2 nodes are siblings if they share the same parent.
6. Ancestor. Parent, or parent's parent, or... etc.
7. External. Of a node, has no children.
8. Internal. Of a node, not external.
9. Subtree. A portion of a tree that is still a tree.
In a filesystem tree, what kind of nodes are the files? what kind are the directories?
Tree methods:
1. element(v) // returns the content of node v
2. root() // returns the root node of the tree.
3. parent(v) //returns the prent of node v
4. children(v) //returns an iterable container holding the children of node v
5. isInternal(v) // returns true if node v is internal
6. isExternal(v) // returns true if node v is external
7. isRoot(v) // returns true if node v is the root
8. size() // returns the number of node in the tree
9. isEmpty() // returns true if the tree is empty
10. positions() // returns an iterable container of all nodes in the tree.
11. elements() // returns an iterable container of the content of all node in the tree.
12. replace(v,e) // replace the contents of node v with e. return the old contents of v.
All of these take O(1) time, except:
1. children(v), which takes O(C_v)
2. elements(), positions(), which take O(n).
Lets define the depth of a node to be the number of ancestors of that node. The depth of the root is 0.

We can come up with a quick algorihtm for computing the depth of a node:

int depth(v)
 d = 0
 while (!isRoot(v)) 
  v = parent(v)
  d++
 return d

But we could also do that recursively:

int depth(v)
 if (isRoot(v))
  return 0
 else
  return 1 + depth(parent(v))

Either way, the run time is O(d), the depth of the node.
depth is easy, but lets say we want to know the height of the tree, defined as the depth of the deepest node. We know a basic way to do this:
```
int height()
 allnodes = positions()
 max = 0
 for each v in all nodes
  h = depth(v)
  if (h > max)
   max = h
 return max
 
```
How long does this take? First, how many times is depth(v) called? Ok, how many steps does each call to depth take? It depends on the depth of each node, and the depth depends on the shape of the tree. What shape tree maximizes the number of steps in depth?
Exactly. A List shaped one. In this case, the first call to depth takes 0 passes through the loop, the second takes 1 pass, the third 2 passes, up to (n-1) passes, giving us 1+2+3+ ... + (n-1) or O(n²).

But we can do better:

int height(v)
 if (isExternal(v))
  return 0
 else
  h = 0
  for each w in children(v)
   n = height(w)
   if (n > h) h = n
  return h + 1

OK, what is the run time of that? Well, analyzing recursive algorithms is a little bit harder, of course.
One way to do it is to count the number times the function gets called, times the work in the function (not counting the recursive call).
How many times is height called? Once per each node node.
How much work does the function do? In each call, we go through the loop C_v times, giving us O(nC_v). This doesn't help us too much, because we don't know how big C_v is. There are 2 ways to fix this:
1. Break the multiplication into repeated addition. instead of nC_v, we have C_v1 + C_v2 + C_v3 + ... + C_v(n-1) + C_vn
  this of course equals n-1, because there are n-1 children in the tree. Thus height is O(n).
2. Use the average of all the C_v. Since the average of the C_v does not grow with the size of the tree, then it is a constant. Thus nC_v is O(n).
What is the lesson here? The main one is that good recursive algorithms on trees are often much better than iterative ones.
Traversals. Another common set of algorithms on trees are traversals. Traversals are a way of visiting all the nodes of a tree. A visit is defined as performing an operation on the node. The simplest visit is printing out the element. There are two basic traversals: Preorder and Postorder.
1. The preorder visits a node, and then performs a pre-order traversal on its children.
2. Lets see...
3. First we visit A, then we we do a pre-order on the children.
4. So we visit B. There are no children, so we go back to A.
5. We go down to C, which we visit. again no children.
6. We go to D. We visit that, then where? H, then I. etc.
7. The Visit Order is: A B C D H I E J K L F M N G O.
```
  preorder(T,v)
    vist(v)
    for each w in children(v)
      preorder(T,w)
  
```
8. Pre order of a book section tree visits nodes in the order read.
Post Order
1. Post order visits a node AFTER doing a post-order of all the children:
2. B C H I D J K L E M N F O G A
```
  postorder(T,v)
    for each w in children(v)
      postorder(T,w)
    visit(v)
  
```
3. Post order is useful for things where we want to visit the parent last (duh). For example, lets say the the tree is a directory tree, and leaf nodes are all files. We want to print out the size of the director- the sum of the size of all files in the directory (or in some descendant directory).
```
  diskspace(T,v)
    sz = 0;
    if (isExternal(v))
      sz = size(v)
    for each w in children(v)
      sz = sz + diskspace(T,w)
    if (isInternal(v))
      print(name(v) + ":" + sz)
    return sz;
  
```
4. Try it out on the tree. Give some numbers to the leaves and go.
What about other kinds of traversals? Can we visit in between traversals of various children? Well that's hard to define when the number of children can vary. But if the number of children is always fixed, then maybe...