2-3-4 Trees
Our next self-balancing tree is called a 2-3-4 tree, so called because nodes can have anywhere from two to four children. 2-3-4 trees are guaranteed to be complete, that is, all leaves have equal depth.
Nodes in a 2-3-4 tree can contain multiple pieces of data, but the rules stay similar to a binary search tree; things to the right are bigger than things to the left. The nodes look like this:
In the above image, in the 2-node, the red subtree consists of data smaller than 5, and the purple data consists of data larger than 5. In the 3-node, red data is smaller than 3, blue is between 3 and 7, and purple is larger than 7. In the 4-node, red is smaller than 3, yellow is between 3 and 5, blue is between 5 and 7, and purple is larger than 7.
Adding to a 2-3-4 Tree
The algorithm for insertion is pretty simple. First, navigate to the appropriate leaf. Does the leaf have room for another datum? If yes, add it. In the following image, we add 24 (the squares are null pointers).
But what do we do if there isn't room? Promote one of the elements to the level above, and split the node:
What if we promote, but there's no room above? Promote from there:
So how long does an insert take? Well, in the best case, we go all the way down to a leaf, find that there is room, and add it. That's O(log n). In the worst case, we do that log(n) work, and discover there isn't room. So, we have to do a promotion. The maximum number of promotions we'll need to do is log(n). So, the big-O of an insert with a 2-3-4 tree is O(log n).
Removing from a 2-3-4 Tree
2-3-4 Tree Removal is a little bit more complicated.
If the data we're looking for is NOT at a leaf, replace it with the smallest data in the right subtree or the largest data in the left subtree (as with a BST). Then we need to call remove on that subtree, looking for that data.
At this point, we can now assume the data we're looking to remove is in a leaf (see why?). The following three blocks of pseudocode, interrupted with pictures, is the algorithm, in which we work from the point where the original data was, downwards to the bottom of the tree:
If we are at the root:
If the child we need to move to and its sibling are 2-Nodes, combine them
and an element from the root into a 4-node, and move to that node
Else, if the child we need to move to is a 2-Node, perform a rotation to
make it a 3-Node. Move to that node.
Here's a picture of such a rotation, in which we make the node containing a 2 into a 3-Node. Note that the red part consists of data between 2 and 3, and the purple part contains data between 3 and 4. Both remain consistent.
For the rest of the algorithm, we can assume that we are NOT at a 2-node. If that is about to not be true, we'll make it so that it is.
While not at a leaf:
If the appropriate child is not a 2-Node, move to it.
Else If its sibling is not a 2-Node, perform a rotation
Else merge the child, its sibling (which is a 2-Node), and the data in the
parent that splits them, into a single 4-Node child. Move to that child.
In the below picture, if we're trying to move to the 2 or 4 nodes, the merge in the else statement looks like this:
We now know that we are at a leaf, which is not a 2-Node. Remove the offending data.
What is the runtime of remove on a 2-3-4 tree? In the worst case, we're removing the root, and we have to first, find the data to replace it with (O(log n) to get to the leaf), then remove that piece of data (O(log n) to get back to the leaf). That means the runtime is O(log n).
Implementation
So how do we implement this? Well... you don't. There's another balanced tree called a Red-Black Tree which is used for this (red-black trees are also well explained in your textbook). Every 2-3-4 tree has an associated Red-Black tree, and there are some easily implemented rules to cause the Red-Black tree to mimic the behavior of a 2-3-4 tree (in the linked-to Wikipedia article, check out the section called "Analogy to B-trees of order 4").
The fact that 2-3-4 trees are difficult to implement in your particular language of choice in no way delegitimizes 2-3-4 trees as a data structure.
If you're implementing your own balanced tree, you might well choose to make it a Red-Black tree, as they're really, really easy to implement. However, you don't have to know anything about them for this course.