Completing Prim's algorithm by figuring out how to make the greedy choice quickly
Last class we described Prim's Algorithm as a greedy algorithm,
we defined its "greedy choice", and we even proved that it
always produces minimum weight spanning trees. However, while
we know
what the greedy choice will be, we
haven't explained
how we are going to make it. What
data structures are we going to employ and how are we going to
use them? Without that info, we don't have a compelte algorithm
that we can actually do time or space analysis on.
The greedy choice is this: choose the minimum weight edge from
a vertex in the current tree to a vertex not in the current tree.
When we have a collection of objects and we want to be able to
quickly choose the minimum, we usually think about a heap.
After all, we are essentially asking for a priority queue, and
we implement priority queues with heaps.
However, what do we store in our heap? The obvious answer is to
store all edges with one vertex in the current tree and one
vertex not in the current tree. But if you think a bit more,
you will see that this is wastefull. If vertex $w$ is not in
the current tree, and $(u,w)$ and $(v,w)$ are both edges from
vertices in the tree to $w$, if the weight of $(u,w)$ is greater
than the weight of $(v,w)$ we will
never choose edge $(u,w)$!
In fact, for any vertex $w$ not in the current tree, the only
edge we need to consider is the smallest weight edge from a
vertex in the current tree to $w$. So we can view what we are
doing not as choosing an edge, but rather choosing a vertex:
-
choose vertex $w$, not in the current tree, whose minimum
weight edge back to a vertex in the current tree is smallest
amongst all vertices not in the current tree
So we will keep $H$, a min-heap of vertices not in the current tree,
where the "key" for a given vertex $w$ in the heap is the
minimum weight of any edge from $w$ back to a vertex in the
tree.
To maintain $H$ and use it to find our minimum-weight-edge,
however, we will need to keep track, for each vertex $w$ in $H$,
of the weight of the minimum weight edge connecting $w$ to the
current tree, and of the vertex from the current tree that is on
the other side of that edge. So we will add two length-$n$
arrays, $W$ and $P'$. For vertex $x$ not in the current tree,
$P'[x]$ is a vertex in the current tree such that $(P'[x],x)$ is the minimum weight
edge from any vertex in the current tree to $x$, and $W[x]$ is
the weight of that edge. Note: if there is no edge from the
current tree to $x$, $W[x] = ∞$ and $P'[x]=-1$.
With head $H$, array $P'$ and array $W$, we can do the
following:
- make the greedy choice: We remove the top element, call
it $x$, choose edge $(P'[x],x)$ of weight $W[x]$, and
restore the min-heap property in $H$ using $W[y]$ as the weight
of vertex $y$.
- update $P$ from Prim's algorithm: We simply set $P[x] =
P'[x]$ to record that edge $(P'[x],x)$ has been added to
the current tree
There's a problem, though. Having added a new vertex to the
current tree, there are potentially new edges connecting the
current tree to other
vertices not in the current tree - edges through vertex $x$.
Specifically, if $y$ is not in the current tree, and edge
$(x,y)$ exists and has smaller weight than $W[y]$, then that
weight needs to be the new "$W[y]$" values, and $P'[y]$ will
need to change too. This is easy
[
Note: for undirected graphs we normally
say "neighbors" rather than "children".]
for each neighbor y of x do
if y not in the current tree and weight of (x,y) < W[y]
W[y] = weight of (x,y)
P'[y] = x
Finally, if $y$ is in the heap, we have to add it, which is
easy, but otherwise we may need to reduce its key, which means
its position in the heap may change. The min-heap data
structure efficiently supports a "reduce key" operation.
However, we have to know the location of the node whose key we
want to reduce. And this requires ... yet another array! But
this is the last one, I promise! $Hpos[y]$ contains a
pointer to the heap location for vertex $y$ if $y$ is in the
heap, and is null otherwise. When we add a new element to the
heap, $Hpos$ will be updated to reflect its location in the
heap. When we remove a vertex from the heap, its $Hpos$ value
will be set to nil. And if we do a "reduce key" operation for
a vertex, its $Hpos$ will have to be updated.
To put this all together, we have:
- $H$ : a min-heap containing vertices not in the current tree
- $W$ : an array of size $n$
- $P'$ : an array of size $n$
- $Hpos$ : array of size $n$
To initialize these structures takes $O(n)$ time. To choose
the vertex $x$ such that edge $(P'[x],x)$ is the Prim's
algorithm greedy choice takes time $O(\lg n)$, since it's just
a removal from a heap of vertices. To (potentialy) update
weights of the neighbors of $x$ takes $n_x \lg n$, where $n_x$
is the number of neighbors of $x$, since for each neighbor $y$
we may have to do a "reduce-key" operation. [Of course, $W$,
$P'$ and $Hpos$ may all be changed for each neighbor, but
those changes take constant time since they are just
modifications of a single array locations.
Over the course of adding all $n$ vertices to the tree, this
gives $O(n \lg n + e \lg n)$. And that is our worst-case
running time for Prim's algorithm!