SI335: Analyzing Prim's MST Algorithm

Reading

These notes and closely review Unit 5 Section 4.1.

Homework

Stay out of trouble! ... and have a great Spring Break.

Review stuff ...

In class we went over the previous homework, and we went over some key points from Problemset 2. We went over this n=6 tree for the tug-of-war problem. In the earlier sections, there was a bug in the tree, which is fixed in the version you see here. [Interestingly, the cause of the bug was ":-2" rather than ":-1" in a Python array slice operation. Darn off-by-one errors!] This tree has depth 4, so it shows that it is possible to do better than $n-1$ contests, which is what all your algorithms produced.

Completing Prim's algorithm by figuring out how to make the greedy choice quickly

Last class we described Prim's Algorithm as a greedy algorithm, we defined its "greedy choice", and we even proved that it always produces minimum weight spanning trees. However, while we know what the greedy choice will be, we haven't explained how we are going to make it. What data structures are we going to employ and how are we going to use them? Without that info, we don't have a compelte algorithm that we can actually do time or space analysis on.

The greedy choice is this: choose the minimum weight edge from a vertex in the current tree to a vertex not in the current tree.

When we have a collection of objects and we want to be able to quickly choose the minimum, we usually think about a heap. After all, we are essentially asking for a priority queue, and we implement priority queues with heaps. However, what do we store in our heap? The obvious answer is to store all edges with one vertex in the current tree and one vertex not in the current tree. But if you think a bit more, you will see that this is wastefull. If vertex $w$ is not in the current tree, and $(u,w)$ and $(v,w)$ are both edges from vertices in the tree to $w$, if the weight of $(u,w)$ is greater than the weight of $(v,w)$ we will never choose edge $(u,w)$! In fact, for any vertex $w$ not in the current tree, the only edge we need to consider is the smallest weight edge from a vertex in the current tree to $w$. So we can view what we are doing not as choosing an edge, but rather choosing a vertex:

choose vertex $w$, not in the current tree, whose minimum weight edge back to a vertex in the current tree is smallest amongst all vertices not in the current tree

So we will keep $H$, a min-heap of vertices not in the current tree, where the "key" for a given vertex $w$ in the heap is the minimum weight of any edge from $w$ back to a vertex in the tree. To maintain $H$ and use it to find our minimum-weight-edge, however, we will need to keep track, for each vertex $w$ in $H$, of the weight of the minimum weight edge connecting $w$ to the current tree, and of the vertex from the current tree that is on the other side of that edge. So we will add two length-$n$ arrays, $W$ and $P'$. For vertex $x$ not in the current tree, $P'[x]$ is a vertex in the current tree such that $(P'[x],x)$ is the minimum weight edge from any vertex in the current tree to $x$, and $W[x]$ is the weight of that edge. Note: if there is no edge from the current tree to $x$, $W[x] = ∞$ and $P'[x]=-1$.

With head $H$, array $P'$ and array $W$, we can do the following:

make the greedy choice: We remove the top element, call it $x$, choose edge $(P'[x],x)$ of weight $W[x]$, and restore the min-heap property in $H$ using $W[y]$ as the weight of vertex $y$.
update $P$ from Prim's algorithm: We simply set $P[x] = P'[x]$ to record that edge $(P'[x],x)$ has been added to the current tree

There's a problem, though. Having added a new vertex to the current tree, there are potentially new edges connecting the current tree to other vertices not in the current tree - edges through vertex $x$. Specifically, if $y$ is not in the current tree, and edge $(x,y)$ exists and has smaller weight than $W[y]$, then that weight needs to be the new "$W[y]$" values, and $P'[y]$ will need to change too. This is easy [Note: for undirected graphs we normally say "neighbors" rather than "children".]

for each neighbor y of x do
  if y not in the current tree and weight of (x,y) < W[y]
     W[y] = weight of (x,y)
     P'[y] = x

Finally, if $y$ is in the heap, we have to add it, which is easy, but otherwise we may need to reduce its key, which means its position in the heap may change. The min-heap data structure efficiently supports a "reduce key" operation. However, we have to know the location of the node whose key we want to reduce. And this requires ... yet another array! But this is the last one, I promise! $Hpos[y]$ contains a pointer to the heap location for vertex $y$ if $y$ is in the heap, and is null otherwise. When we add a new element to the heap, $Hpos$ will be updated to reflect its location in the heap. When we remove a vertex from the heap, its $Hpos$ value will be set to nil. And if we do a "reduce key" operation for a vertex, its $Hpos$ will have to be updated.

To put this all together, we have:

$H$ : a min-heap containing vertices not in the current tree
$W$ : an array of size $n$
$P'$ : an array of size $n$
$Hpos$ : array of size $n$

To initialize these structures takes $O(n)$ time. To choose the vertex $x$ such that edge $(P'[x],x)$ is the Prim's algorithm greedy choice takes time $O(\lg n)$, since it's just a removal from a heap of vertices. To (potentialy) update weights of the neighbors of $x$ takes $n_x \lg n$, where $n_x$ is the number of neighbors of $x$, since for each neighbor $y$ we may have to do a "reduce-key" operation. [Of course, $W$, $P'$ and $Hpos$ may all be changed for each neighbor, but those changes take constant time since they are just modifications of a single array locations. Over the course of adding all $n$ vertices to the tree, this gives $O(n \lg n + e \lg n)$. And that is our worst-case running time for Prim's algorithm!