fishy

Trees

  1. Our next ADT will be the Tree.
  2. When we say the word "tree" in Computer Science, what we mean is a tree like a family tree, or a military command structure. We can draw one like: (insert academy hierarchy)
  3. What trees represent is hierarchy. This is different from lists in that lists represent linear structure.
  4. Other things organized as trees: folders/directories, book chapters, decision trees.


  5. In a tree, everything must be connected up to a single root.
  6. Definitions:
    1. Node. element of the tree.
    2. Root. Node at the top.
    3. Parent. Of 2 nodes, A and B, A is the parent of B is A is directly above B. Nodes can each have only one parent.
    4. Child. Reverse of parent.
    5. Sibling. 2 nodes are siblings if they share the same parent.
    6. Ancestor. Parent, or parent`s parent, or... etc.
    7. External. Of a node, has no children.
    8. Internal. Of a node, not external.
    9. Subtree. A portion of a tree that is still a tree.
  7. In a filesystem tree, what kind of nodes are the files? what kind are the directories?
  8. Tree methods:
    1. element(v) // returns the content of node v
    2. root() // returns the root node of the tree.
    3. parent(v) //returns the parent of node v
    4. children(v) //returns an iterable container holding the children of node v
    5. isInternal(v) // returns true if node v is internal
    6. isExternal(v) // returns true if node v is external
    7. isRoot(v) // returns true if node v is the root
    8. size() // returns the number of node in the tree
    9. isEmpty() // returns true if the tree is empty
    10. positions() // returns an iterable container of all nodes in the tree.
    11. elements() // returns an iterable container of the content of all node in the tree.
    12. replace(v,e) // replace the contents of node v with e. return the old contents of v.
    13. insert()/delete() // There is usually some form of insert or delete, but they aren't standardized because it depends on the application of the tree.
  9. All of these take O(1) time, except:
    1. children(v), which takes O(Cv)
    2. elements(), positions(), which take O(n). Well, not really, it might only take O(1) to return the iterator, but what we'll really care about is how long it takes to iterate through the items, which will be O(n).
  10. Lets define the depth of a node to be the number of ancestors of that node. The depth of the root is 0.
  11. We can come up with a quick algorithm for computing the depth of a node:
    int depth(v)
     d = 0
     while (!isRoot(v)) 
      v = parent(v)
      d++
     return d
    
  12. But we could also do that recursively:
    int depth(v)
     if (isRoot(v))
      return 0
     else
      return 1 + depth(parent(v))
    
  13. Either way, the run time is O(d), the depth of the node.
  14. depth is easy, but lets say we want to know the height of the tree, defined as the depth of the deepest node. We know a basic way to do this:
    int height()
     allnodes = positions()
     max = 0
     for each v in all nodes
      h = depth(v)
      if (h > max)
       max = h
     return max
     
  15. How long does this take? First, how many times is depth(v) called? Ok, how many steps does each call to depth take? It depends on the depth of each node, and the depth depends on the shape of the tree. What shape tree maximizes the number of steps in depth?
  16. Exactly. A List shaped one. In this case, the first call to depth takes 0 passes through the loop, the second takes 1 pass, the third 2 passes, up to (n-1) passes, giving us 1+2+3+ ... + (n-1) or O(n2).
  17. But we can do better:
    int height(v)
     if (isExternal(v))
      return 0
     else
      h = 0
      for each w in children(v)
       n = height(w)
       if (n > h) h = n
      return h + 1
     
  18. OK, what is the run time of that? Well, analyzing recursive algorithms is a little bit harder, of course.
  19. One way to do it is to count the number times the function gets called, times the work in the function (not counting the recursive call).
  20. How many times is height called? Once per each node node.
  21. How much work does the function do? In each call, we go through the loop Cv times, giving us O(nCv). This doesn't help us too much, because we don't know how big Cv is. There are 2 ways to fix this:
    1. Break the multiplication into repeated addition. instead of nCv, we have Cv1 + Cv2 + Cv3 + ... + Cv(n-1) + Cvn
      this of course equals n-1, because there are n-1 children in the tree. Thus height is O(n).
    2. Use the average of all the Cv. Since the average of the Cv does not grow with the size of the tree, then it is a constant. Thus nCv is O(n).
  22. What is the lesson here? The main one is that good recursive algorithms on trees are often much better than iterative ones.
  23. Traversals. Another common set of algorithms on trees are traversals. Traversals are a way of visiting all the nodes of a tree. A visit is defined as performing an operation on the node. The simplest visit is printing out the element. There are two basic traversals: Preorder and Postorder.
    1. The preorder visits a node, and then performs a pre-order traversal on its children.
    2. Lets see...

    3. First we visit A, then we we do a pre-order on the children.
    4. So we visit B. There are no children, so we go back to A.
    5. We go down to C, which we visit. again no children.
    6. We go to D. We visit that, then where? H, then I. etc.
    7. The Visit Order is: A B C D H I E J K L F M N G O.
        preorder(T,v)
          visit(v)
          for each w in children(v)
            preorder(T,w)
        
    8. Pre order of a book section tree visits nodes in the order read.
  24. Post Order
    1. Post order visits a node AFTER doing a post-order of all the children:
    2. B C H I D J K L E M N F O G A
        postorder(T,v)
          for each w in children(v)
            postorder(T,w)
          visit(v)
        
    3. Post order is useful for things where we want to visit the parent last (duh). For example, lets say the the tree is a directory tree, and leaf nodes are all files. We want to print out the size of the director- the sum of the size of all files in the directory (or in some descendant directory).
        diskspace(T,v)
          sz = 0;
          if (isExternal(v))
            sz = size(v)
          for each w in children(v)
            sz = sz + diskspace(T,w)
          if (isInternal(v))
            print(name(v) + ":" + sz)
          return sz;
        
    4. Try it out on the tree. Give some numbers to the leaves and go.
  25. What about other kinds of traversals? Can we visit in between traversals of various children? Well that`s hard to define when the number of children can vary. But if the number of children is always fixed, then maybe...

Special Case: Binary Trees

  1. If we fix the number of children a node can have at two, then we get what is known as a binary tree.
  2. the binary tree has two additional methods: left(v) and right(v) which return the left child or the right child.
  3. A proper binary tree is one where all internal nades have exactly two children.
  4. A complete binary tree is a proper binary tree where all leaves have the same depth.
  5. A full binary tree is a complete binary tree with height h, such that it is complete to depth h-1, all internal nodes at depth h-1 occur to the left of external nodes at that depth, at most 1 internal node at depth h-1 has one child, and that node will be to the right of all other internal nodes, and the child of that node will be a left child.
  6. Properties of a binary tree:
    1. in a complete binary tree, the number of nodes at depth d is 2d.
      Proof:
      there are 20 nodes at depth 0.
      if there are 2d nodes at depth d, then there are 2d+1 nodes at depth d+1. Each node at depth d has 2 children, so there are 2*2d nodes at d+1, and 2*2d= 2d+1.
    2. in a complete binary tree the height of the tree is log(n+1)-1.
      1. Lemma: the sequence 2h+2h-1+2h-2+...+21+20 = 2h+1 -1
        Base Case: 20 = 21 -1
        Inductive Step: Assume
        2h+2h-1+2h-2+...+21+20 = 2h+1 -1, show 2h+1+ 2h+2h-1+2h-2+...+21 = 2h+2 -1
        2h+1+ 2h+2h-1+2h-2+...+21+20 ?=? 2h+2 -1 Question
        2h+1 +2h+1 -1 ?=? 2h+2 -1 Substitution
        2⋅2h+1 -1 ?=? 2h+2 -1 Algebra
        2h+2 -1 = 2h+2 -1 Q.E.D.
      2. Proof:
        The number of nodes in the tree is the sum of the number of nodes at each level:
        n = 2h+2h-1+2h-2+...+21+20 = 2h+1 - 1
        n = 2h+1 - 1
        n+1 = 2h+1
        log(n+1) = log(2h+1)
        log(n+1) = h+1
        h = log(n+1) - 1.
    3. There are some other properties in your book. You should read them.
  7. Traversals of binary trees.
    1. We can do the usual traversals: preorder or postorder.
    2. we can also do the insanely fun in-order traversal.
         inOrder(v)
          if (hasLeft(v))
            inOrder(left(v))
          visit(v)
          if (hasRight(v))
            inOrder(right(v))
        
  8. We can use in-order traversal to print a binary tree on its side.
    1. If we look at the following tree:


      The in-order traversal is: H D L I M B E A J F K C G

    2. If we list the nodes from left to right as they appear, we get: H D L I M B E A J F K C G. Coincidence or nefarious plot?
    3. Either way, we can use this to print out a tree:
        binaryTreePrint()
          h = height(root())
          bTreePrint(root(),h)
      
        bTreePrint(v,h)
          if (hasLeft(v))
            bTreePrint(left(v),h)
          for (i = 0; i < (h - depth(v)); i++) print("   ")
          printline(v)
          if (hasRight(v))
            bTreePrint(right(v),h)
        
      This prints out the tree, sideways:
          H
            D
        L
          I
        M
              B
            E
                A
          J
            F
          K
              C
            G
        
  9. Lets talk about binary tree implementation.
    1. We could implements it just like a linked list or a general tree, with pointers:
        class BinaryTreeNode {
          Object data;
          BinaryTreeNode leftChild;
          BinaryTreeNode righChild;
        }
        
    2. And that's fine. You've done the general tree, so this should be easy, but there's another way to represent a binary tree that uses no pointers! It takes more space, but is sometimes more convenient.
    3. To understand the basic idea, lets look at the following tree:

    4. Look at the numbers of the nodes. What is the relationship between the number at a node and the number at its children? Right, the left child is 2n and the right is 2n+1.
    5. So what we'll do is store this tree in an array where each node is located at the index of the array according to its number. Draw picture.
    6. Now a "node" is just represented as an index number, and the element is the item in the array. How do we find the left child of some node x? 2x. and the right child? 2x+1.
    7. How do we find the parent of node x? floor(x/2).
    8. What happens if the tree is not complete? The array has empty parts.
    9. Let`s look at the run time of basic binary tree operations in the array vs. linked tree:
      Operation Array Linked
      size, isEmpty O(1) O(1)
      elements,positions O(n) O(n)
      replace O(1) O(1)
      root, parent, children, left, right O(1) O(1)
      hasLeft/Right, isInternal, isRoot O(1) O(1)
      insert O(n) O(1)