Ordered Maps: Binary Search Trees

  1. What we want in an ordered Map is the same properties as a regular Map, plus the operations: next(k) and previous(k). The book calls these closestElementAfter(k) and closestElementBefore(k), but that's too wordy.
  2. Because the are using an ordered map, presumably we want next() and previous() to be fast, and thus, if we use and array or linked list, we'll want it to be sorted.
  3. But we already know the problems with that:
    1. An array can take log(n) to find, but n to insert or delete.
    2. A linked list is easy to insert or delete, once we find the item, but it takes n to find it.
  4. Hash tables won't work at all because its too hard to find the next.
  5. What we want is a linked structure (for easy inserts/deletes) but that has log(n) finds.
  6. Linked structure.... log(n)....we want trees! Binary Search Trees, to be precise.
  7. A BST is a binary tree where an item (and key) is stored at each node. it has the following 2 properties:
    1. ∀n,m LeftDescendant(m,n) → key(m) < key(n)
    2. ∀n,m rightDescendant(m,n) → key(m) > key(n)
  8. Lets try one with the following set of numbers:
    51 19 99 96 13 15 21 88 71 54 60 55 83 29 72
  9. Is this tree proper? No. Why should we care? Sometimes we'll want to do things with a node's parent. If in some algorithm, we run off the end of the tree, then v would be null, and we would have no access to the parent. We'll see an example shortly.
  10. Lets make it proper. What we'll do is insert "dummy leaf nodes" through the whole tree. No item is actually stored at a leaf node. No internal node is a "dummy". Adding these nodes makes the tree proper.
  11. OK, what do we do with this? Well, we want to find things. how does that work?
    BSTSearch(k,v)
      if(isExternal(v)) return null;
      if(key(v)=k) return v;
      if(k < key(v))
        return BSTSearch(k,left(v))
      if(k > key(v))
        return BSTSearch(k,right(v))
    
  12. How long does that take (in terms of the height of the tree)? O(h). More about this later.
  13. OK, what about insert? We find the leaf where it is supposed to go, and then we put it there:
    BSTInsert(k,item,v)
      if(isExternal(v)
        v.data ⇐ item
        v.left ⇐ new dummy node
        v.right ⇐ new dummy node
      if(key(v)=k) v.data⇐item //this is a map, remember?
      if(k < key(v))
        return BSTInsert(k,item,left(v))
      if(k > key(v))
        return BSTInsert(k,item,right(v))
    
  14. This also takes O(h).
  15. Deleting is trickier. Our cases are:
    1. We're deleting something with only dummy children. That's easy- just get rid of it.
    2. We're deleting something with only one real child, such as 88. Also easy. get rid of it and replace it with its one real child.
    3. We're deleting something with two children, such as 51. We want to alter the tree as little as possible, right? Could we find a node to swap with 51 that would require no tree restructuring? (E.G. swapping with 71 would not work) Either 29 or 54.
    4. What if we were deleting 71? Either 60 or 72.
    5. What do these nodes have in common? They are either the left-most right descendant or the right-most left descendant. We can pick either of these numbers, I'll always use left-most right descendant.
    6. Why does that work? well, all right descendants are greater than k, but the left-most has nothing less than it in its part of the tree. It is the next key up. Thus swapping with it shouldn't change the tree.
  16. So what do we do?
    BSTDelete(k,v)
      d ⇐ BSTSearch(k,v)
      if (isExternal(right(v)) 
        if(right(parent(v))=v) 
          right(parent(v)) ⇐ left(v)
        else
          left(parent(v)) ⇐ left(v)
      else
        s ⇐ leftMostRight(d)
        swap(s,d)
        left(parent(s)) ⇐ right(s)
    
  17. whew.
  18. Oh, wait, what is leftMostRight()?
    leftMostRight(v)
      leftMost(right(v))
    
    leftMost(v)
      if(isExternal(left(v))) return v
      leftMost(left(v))
    
  19. OK, how long does this take? O(h)!
  20. Why does leftMostRight find the right thing to swap in? Because it finds the next highest key, and thus, by putting the next highest key in, we don't need to restructure the tree.
  21. Why does it find the next largest than v?
    1. Everything to the right of v is larger than v.
    2. if v is a left child of parent(v) then key(parent(v)) > key(v). But if v has a right child then key(right(v)) < key(parent(v), so the next largest must be in that subtree.
    3. The smallest node in a subtree is the leftmost node.
    4. And by the way, the leftmost node can have no left child, so it can be deleted easily.
  22. Given what we know now, finding next(k) shouldn't be too bad:
    1. first we need to find(k).
    2. if that node has a right child, then next is just the leftMostRight descendant.
    3. But what if there is no right child?
    4. Then the next highest is the first ancestor that is to the right:
      next()  
        v ⇐ BSTSearch(k)
        if(isInternal(right(v)) return leftMostRight(v)
        else
          p ⇐ parent(v)
          while (p ≠ null && right(p) = v)
            v ⇐ p
            p ⇐ parent(v)
          return p
      
  23. The only thing left is previous, which is just the opposite of next():
    previous()  
      v ⇐ BSTSearch(k)
      if(isInternal(left(v)) return rightMostLeft(v)
      else
        p ⇐ parent(v)
        while (p ≠ null && left(p) = v)
          v ⇐ p
          p ⇐ parent(v)
        return p
    
  24. And thus next takes O(h).