Reading

These notes and closely review Unit 5 Section 3 intro and 3.1.

Quiz questions

Optimization Problems

Many important problems are optimization problems, meaning problems for which there are many possible solutions, from amongst which you want to find "the best". What makes one solution "better" than another? That depends on the problem.

One problem you are already familiar with is finding the shortest path in a graph between two vertices. If $u$ and $v$ are two vertices in a graph $G$, the problem "determine whether there is a path between $u$ and $v$ in $G$" is not an optimization problem. (In fact it is what's called a decision problem!) Why? Because there is only one solution, and there's no concept of it being more or less good than any other solution. On the other hand, "find the/a shortest path between $u$ and $v$ in graph $G$" is an optimization problem. There are typically many paths between $u$ and $v$, each of which has a cost associated with it. We want to find a path with the smallest possible cost. Similarly, in a weighted graph, we have the problem of finding the path in which the sums of the edge weights along the path is minimized.

Whenever you see "smallest", "shortest" or "minimize", or "largest", "longest" or "maximize", those are cues that we are talking about optimization.

Greedy Algorithms

One classic algorithmic paradigm for approaching optimization problems is the greedy algorithm. Greedy algorithms follow this basic structure: First, we view the solving of the problem as making a sequence of "moves" such that every time we make a "moves" we end up with a smaller version of the same basic problem. Second, we follow an approach of always taking whichever "move" looks best at the moment, and we never back up and change the "move".
  1. choose a move if one exists, otherwise exit
  2. update the problem according to the move you've made (which gives a smaller problem of the same type)
  3. go back to step 1

The "greedy approach" is best seen with a concrete example.

Activity Selection

You have "activities" A1, ... ,An that require a resource. (Perhaps the "activities" are classes and the "resource" is a classroom.) Each activity has a starting and ending time. Produce a schedule that allows the most activities to take place.

We need to phrase the solving of this problem as a series of choices to be made. In this case, we'll simply "choose" an activity to add to our schedule, which will then wipe out it and any activity overlapping it in our list of activities. According to our above description of greedy algorithms, this produces:

  1. choose an activity from the list of activities, or exit if none are left
  2. add chosen activity to "the schedule" and remove it and all activities with times overlapping it from the list of activities
  3. go back to step 1

In short, the greedy paradigm says: "Take the activity that looks best (and doesn't conflict with anything already in the schedule) and add it to your schedule." But how do we decide which activity "looks best"? What makes a good "Greedy Choice"? Here are three plausible and obvious ideas:

  1. the shortest activity
  2. the activity with the earliest starting time
  3. the activity with the earliest ending time

Counter-examples - when greedy algorithms fail

It turns out that 1 and 2 do not always produce optimal solutions. Proving that a particular "greedy choice" doesn't work is actually quite easy. All we have to do is produce a counter example, i.e. one concrete example of an input for which the greedy choice fails to find an optimal solution.
Hey! Wake up! This is important! We prove that a particular greedy choice fails by simply giving one concrete counter example!
Here is a counter example for 1:
A1 = [1,5]  A2 = [6,10]  A3 = [4,7]
If we choose the shortest activity first, we will choose A3. But A3 overlaps both A1 and A2, so we won't be able to add anything to that schedule. Meanwhile, A1,A2 is a better schedule. So we failed to find an optimal solution. Here is a counterexample for 2:
A1 = [0,10]  A2 = [1,3]  A3 = [4,6]
If we choose the earliest starting time first, we'll take A1, which overlaps both A2 and A3. Thus, we find a schedule of length 1. Meanwhile, A2, A3 is a schedule of two activities, so we failed to find an optimal solution.

Proofs of optimality - when greedy algorithms succeed

Choosing the activity with earliest ending time seems to yield optimal solutions. Consider the following problem:
A1 = [6,11], A2 = [4,7], A3 = [1,5], A4 = [12,17], A5 = [10,13], A6 = [0,10]
If we keep choosing the activity with the earliest finish time and which doesn't overlap with what we've scheduled thus far, we'll end up with:
A3, A1, A4
which is indeed an optimal solution. So we arrive at the following algorithm:
Algorithm: Activity Selection
Input: a list A of activities
Output: a list activities from A comprising an optimal schedule
  1. Let S = { }
  2. if A = { } return S
  3. choose the activity Ai from A that has the earliest ending time and add it to S
  4. remove each element of A that overlaps Ai
  5. goto Step 2

However, having an example for which our greedy choice produces an optimal solution is a far cry from showing that it always produces an optimal solution. We need to prove that this is the case. That takes two steps: First we show that our greedy choice, i.e. taking the activity with the earliest finish time, is always a part of an optimal schedule. Second we need to show that an optimum schedule always looks like an activity Ai followed by an optimum solution to the subproblem of scheduling all activities A1 ... An that start after Ai finishes.

Proposition: If A1 ... An is a list of activities, then there is an optimal schedule that starts with Ai, where Ai has the earliest finishing time.
Proof: Let

Ar1, Ar2, ..., Ark
be an optimal schedule, sorted in increasing ending times, that does not include Ai. Then Ai finishes before or at the same time as A_r1. Therefore,
Ai, Ar2, ..., Ark
is also an optimal schedule. Thus there is always an optimal schedule starting with Ai.

Proposition: If A1 ... An is a list of activities, and if

Ar1, Ar2, ..., Ark
is an optimal schedule for those activities, with Ar1 the earliest ending activity in the schedule, then
Ar2, ..., Ark
is an optimal schedule for the subproblem consisting of all activities in A1 ... An that start after Ar1 finishes.
proof: Suppose Ar2, ..., Ark is not an optimal schedule. Then there is some schedule for the subproblem consisting of all activities in A1 ... An that start after Ar1 finishes that's longer. Call that schedule As1, As2, ... , Asm where m > k. Then
Ar1, As1, As2, ..., Asm
is a longer schedule than Ar1, Ar2, ..., Ark, which is a contradiction, since Ar1, Ar2, ..., Ark is optimal.

Hey! Wake up! This is important! We prove that a particular greedy choice does produce optimal solutions by showing two things: 1. that the greedy choice is always part of an optimum solution, and 2. that our greedy choice plus an optimum solution to the smaller subproblem that's left over after we make our greedy choice is indeed an optimum solution to the original problem.