Priority Queues

We've finished Maps! It's time for something new. Priority Queues! Priority Queues are a bit like Maps, but instead of every element having a key, it has a priority (usually, these are numbers, but they don't have to be). Once we enter some elements into a PQ, the only element we can pull back out is the one with highest priority. Priority Queues must have the following methods:

Let's look at some dumb ways to do this, first!

Unsorted Array/Linked List: Yeah, we could do this. Insert would be constant time! getMax(), though, would be O(n), as you'd have to iterate along every element in the PQ to find the maximum, and then remove it.

Sorted Array/Linked List: This maybe feels a little less ugly. Insert now takes O(n), as you have to find the spot for/make room for your new element. getMax(), though, is now O(1).

Could we do this with binary search trees? I guess we could. I'd rather not, though... it's ugly. BSTs are designed to make it easy to find anything in the tree, and we don't care about that this time. We just care about finding one particular element, so maybe we can organize things differently in a tree to make this easier? Because different priorities should mean a different structure?

Heaps

You would be right! We're going to use something called a Binary Heap (this is usually the type of heap people mean when they just say "heap"). Just as a BST is a type of binary tree with the data organized in a certain way, a Heap is a type of binary tree with the data organized in a different way.

Heaps are really good at being PQs, to the point that using much else would be a little weird. As a result, heaps and Priority Queues are often mixed up by people who haven't learned the difference between an ADT and a data structure. PQ is the ADT. Heap is the data structure.

Heaps have two rules:

  1. The priority of a parent is greater than the priority of its children. We know nothing about the comparitive priorities of the left and right children. We only know that the parent is of higher priority than the children.
  2. The tree is complete. Remember, complete means the tree is as short as possible. Not only will this be complete, but also, if any depth is unfinished, the leaves must be all the way over to the left.

Here's an example of a heap, which follows all the rules above:

Insert

Inserting is pretty simple. Place the new data in the first available spot in the tree (in the above heap, this is the left child of 8). Now, we perform a "bubble up," by running the following loop:

While the parent is of smaller priority than this data,
  swap the node and its parent.

Let's take the above tree, and add an element with priority 90. We would go through the following steps:

Add the data to the first open slot in the heap.

90>8, so swap.

90>10, so swap.

90<100, so stop.

How long does this take? Well, in the worst case, we perform one swap for every level of the tree, and this is a complete binary tree, so O(log n).

Get Max

It's pretty clear that if we only had to return the max element, without changing the tree, this would be O(1), as we only need return what is at the root. However, we DO need to change tree. Insertion involved "bubbling up," so you might (correctly) guess we'll be bubbling down this time. Take the last element in the tree (in our example, after the insertion of 90, this is 8), and put it as the root. Then,

While the data is smaller than at least one of the two children,
  Swap with the largest child

Here's the process. First, we put the 8 at the root.

Now, 8 < 90, so perform that swap.

Now, 8 < 9 AND 10, so perform a swap with the larger of the two.

And we're finished!

How long did that take? Again, that took O(log n) time.

Heap Fill

Now, suppose we have an array full of elements and their priorities, and we want to create a brand new heap, filled with all these elements (note that we don't always have this array ahead of time, so what we're about to talk about won't always be appropriate). How long does this take? O(n log(n)) time, right? It turns out we can do better!

Let's use the following list of numbers: 16, 15, 4, 12, 6, 7, 23, 20, 25, 5, 11, 27, 9, 8, 14

Now, because we know how many elements we're going to heapify (yes, that's a word), we already know what the heap will be shaped like. So, we'll take the first \(\frac{n+1}{2}\) elements in our array, and fill in the bottom layer.

This involves doing approximately \(\frac{n}{2}\) operations to place those elements in the heap.

Next, we'll take the next \(\frac{n+1}{4}\) elements, and fill in the second-from the bottom layer. This gives us \(\frac{n+1}{4}\) little heaps, of height two. If any bubbling down needs to happen, we do that.

This involves \(\frac{n}{4}\) operations to add the elements, and then \(1*\frac{n}{4}\) bubble-down operations to fix it.

Next, we take the next \(\frac{n+1}{8}\) elements, and fill in the next layer, giving us \(\frac{n+1}{8}\) heaps. Again, if any bubbling down needs to happen, we do that.

This gives us about \(\frac{n}{8}\) operations to add the elements, and at most \(2*\frac{n}{8}\) operations to fix the heaps (max of 2 buble-down operations per heap, \(\frac{n}{8}\) heaps).

And we continue until everything has been heapified.

This last step is 1 operations to add the element, and most 3 bubble-down operations.

And we're done! Now, how long did that take? Well, we have \(\frac{n}{2} + \frac{n}{4} + \frac{n}{8} + \dots + \frac{n}{n}\) operations to add the elements to the heaps, and \(0*\frac{n}{2}+1*\frac{n}{4}+2*\frac{n}{8}+\dots+log(n)\frac{n}{n}\) bubble-down operations.

The first of those summations approaches n. To see why, imagine a status bar of length n. First fill in n/2. Then fill in n/4. Then fill in n/8. If we continue doing this, we'll never quite reach n, but we'll come arbitrarily close.

The second summation could be written as \(\sum_{i=0}^{\log(n)} i*\frac{n}{2^{i+1}} = n*\sum_{i=0}^{\log(n)}i*\left(\frac{1}{2}\right)^{i+1}\).

\(\begin{align*}n*\sum_{i=0}^{\log(n)}i*\left(\frac{1}{2}\right)^{i+1}\leq& n*\sum_{i=0}^{\log(n)}i*\left(\frac{1}{2}\right)^{i}\\ \leq&n*\sum_{i=0}^{\infty}i*\left(\frac{1}{2}\right)^{i}\\ =&2*n\\ \in&O(n) \end{align*}\)

The last step is because that infinite series converges to 2. So, we have an O(n)+O(n)=O(n)! We've gone from O(nlog n) to O(n)!

Implementation

This doesn't seem so hard to implement. We just take our familiar TreeNodes, write some methods bubbleUp and bubbleDown, and be done with it! And, that would work great. Wonderful.

It turns out, though, that when working with a complete tree, it is unnecessary to actually make the TreeNode class and all that. Instead, we can just use an array. Consider the complete tree below. In this case, the numbers are NOT the data within the nodes; instead, they are the indices of the corresponding array. We keep nothing in index 0 (for convenience), the root in spot 1, its child in index 2, its other child in index 3, and so on. This is convenient, because now if a node is stored in index i, its left child is in 2i, and its right child is in 2i+1. Its parent, due to the magic of integer math, is in i/2.

Courtesy xkcd.