2-3-4 Trees
Our next self-balancing tree is called a 2-3-4 tree (or a B-tree of order 4), so called because nodes can have anywhere from two to four children. A 2-3-4 tree is balanced when:
- All leaves have equal depth.
- If a node has ANY children, it has ALL possible children.
Nodes in a 2-3-4 tree can contain multiple pieces of data, but the rules stay similar to a binary search tree; things to the right are bigger than things to the left. The nodes look like this:
In the above image, in the 2-node, the red subtree consists of data smaller than 5, and the purple data consists of data larger than 5. In the 3-node, red data is smaller than 3, blue is between 3 and 7, and purple is larger than 7. In the 4-node, red is smaller than 3, yellow is between 3 and 5, blue is between 5 and 7, and purple is larger than 7.
Adding to a 2-3-4 Tree
The algorithm for insertion is pretty simple. First, navigate to the appropriate leaf. Does the leaf have room for another datum? If yes, add it. In the following image, we add 24 (the squares are null pointers).
But what do we do if there isn't room? Promote one of the elements to the level above, and split the node:
What if we promote, but there's no room above? Promote from there:
So how long does an insert take? Well, in the best case, we go all the way down to a leaf, find that there is room, and add it. That's O(log n). In the worst case, we do that log(n) work, and discover there isn't room. So, we have to do a promotion. The maximum number of promotions we'll need to do is log(n). So, the big-O of an insert with a 2-3-4 tree is O(log n).
Implementation
So how do we implement this? Well... you don't. There's another balanced tree called a Red-Black Tree which is used for this (red-black trees are also well explained in your textbook). Every 2-3-4 tree has an associated Red-Black tree, and there are some easily implemented rules to cause the Red-Black tree to mimic the behavior of a 2-3-4 tree (in the linked-to Wikipedia article, check out the section called "Analogy to B-trees of order 4").
The fact that 2-3-4 trees are difficult to implement in your particular language of choice in no way delegitimizes 2-3-4 trees as a data structure.
If you're implementing your own balanced tree, you might well choose to make it a Red-Black tree, as they're really, really easy to implement. However, you don't have to know anything about them for this course.
Comparison to AVL Tree
The tallest 2-3-4 tree is made entirely of 2-nodes, making it a complete binary tree. This should make it clear that it's of logarithmic height, comparable to AVL trees. But, which is better?
As is often the case, the answer is "it depends." 2-3-4 trees have balanced height, but can have extremely unbalanced weight. This makes inserts easier; there is less restructuring to do on your average insert. However, it also makes get() take longer, since your tree is taller. So: faster at insert(), slower at get().