One problem you are already familiar with is finding the shortest path in a graph between two vertices. If $u$ and $v$ are two vertices in a graph $G$, the problem "determine whether there is a path between $u$ and $v$ in $G$" is not an optimization problem. (In fact it is what's called a decision problem!) Why? Because there is only one solution, and there's no concept of it being more or less good than any other solution. On the other hand, "find the/a shortest path between $u$ and $v$ in graph $G$" is an optimization problem. There are typically many paths between $u$ and $v$, each of which has a cost associated with it. We want to find a path with the smallest possible cost. Similarly, in a weighted graph, we have the problem of finding the path in which the sums of the edge weights along the path is minimized.
Whenever you see "smallest", "shortest" or "minimize", or "largest", "longest" or "maximize", those are cues that we are talking about optimization.
The "greedy approach" is best seen with a concrete example.
We need to phrase the solving of this problem as a series of choices to be made. In this case, we'll simply "choose" an activity to add to our schedule, which will then wipe out it and any activity overlapping it in our list of activities. According to our above description of greedy algorithms, this produces:
In short, the greedy paradigm says: "Take the activity that looks best (and doesn't conflict with anything already in the schedule) and add it to your schedule." But how do we decide which activity "looks best"? What makes a good "Greedy Choice"? Here are three plausible and obvious ideas:
A1 = [1,5] A2 = [6,10] A3 = [4,7]If we choose the shortest activity first, we will choose A3. But A3 overlaps both A1 and A2, so we won't be able to add anything to that schedule. Meanwhile, A1,A2 is a better schedule. So we failed to find an optimal solution. Here is a counterexample for 2:
A1 = [0,10] A2 = [1,3] A3 = [4,6]If we choose the earliest starting time first, we'll take A1, which overlaps both A2 and A3. Thus, we find a schedule of length 1. Meanwhile, A2, A3 is a schedule of two activities, so we failed to find an optimal solution.
A1 = [6,11], A2 = [4,7], A3 = [1,5], A4 = [12,17], A5 = [10,13], A6 = [0,10]If we keep choosing the activity with the earliest finish time and which doesn't overlap with what we've scheduled thus far, we'll end up with:
A3, A1, A4which is indeed an optimal solution. So we arrive at the following algorithm:
Algorithm: Activity Selection
Input: a list A of activities
Output: a list activities from A comprising an optimal schedule
- Let S = { }
- if A = { } return S
- choose the activity Ai from A that has the earliest ending time and add it to S
- remove each element of A that overlaps Ai
- goto Step 2
However, having an example for which our greedy choice produces an optimal solution is a far cry from showing that it always produces an optimal solution. We need to prove that this is the case. That takes two steps: First we show that our greedy choice, i.e. taking the activity with the earliest finish time, is always a part of an optimal schedule. Second we need to show that an optimum schedule always looks like an activity Ai followed by an optimum solution to the subproblem of scheduling all activities A1 ... An that start after Ai finishes.
Proposition: If A1 ... An is a list of activities, then
there is an optimal schedule that starts with Ai, where
Ai has the earliest finishing time.
Proof: Let
Ar1, Ar2, ..., Arkbe an optimal schedule, sorted in increasing ending times, that does not include Ai. Then Ai finishes before or at the same time as A_r1. Therefore,
Ai, Ar2, ..., Arkis also an optimal schedule. Thus there is always an optimal schedule starting with Ai.
Proposition: If A1 ... An is a list of activities, and if
Ar1, Ar2, ..., Arkis an optimal schedule for those activities, with Ar1 the earliest ending activity in the schedule, then
Ar2, ..., Arkis an optimal schedule for the subproblem consisting of all activities in A1 ... An that start after Ar1 finishes.
Ar2, ..., Ark is not an
optimal schedule. Then there is some schedule for the
subproblem consisting of all activities in A1 ... An that
start after Ar1 finishes that's longer. Call that schedule
As1, As2, ... , Asm where m > k. Then
Ar1, As1, As2, ..., Asmis a longer schedule than
Ar1, Ar2, ..., Ark,
which is a contradiction, since Ar1, Ar2, ...,
Ark is optimal.