Greedy Algorithms

  1. The next section of the course will look at algorithms. We've already looked at plenty of algorithms that perform operations within our data structure, but their job was to make the DS work, not to directly solve out problem.
  2. We will break our algorithms into categories where there is a familiar theme to all the algorithms in that category.
  3. The first category we will look at are greedy algorithms. A greedy algorithm solves a problem by picking the best sub-solution at that point in time regardless of whether it results in the best possible solution. We call this making "locally optimal" choices to potentially discover a "globally optimal" solution. They do not re-examine previous answers (unlike our dynamic programming algorithms) and do not look forward either. A few examples we'll look at:

    Activity Selection Problem

  4. Assume you've been given a set of activities (with start and end times) and your goal is to pick the schedule that maximizes the number of activities you do.
  5. It may not be immediately apparent which activity we should start with. You may think "well, if we just start with the activity that starts the earliest, we'll get more things done!" Well, believe it or not, that actually won't work! Imagine signing up for the 0400 activity (doing something undesirable, like babysitting plebes or something) only to find out it's for the next 12 hours and you're going to miss all the cool stuff going on at more sane times of the day. Believe it or not, a greedy solution where we always pick the activity that ends first initially will result in the globally optimal solution. Let's take a look:
    def MaxActivities(activity_list): # Each activity is a tuple (start, end)
        activity_list.sort(key=lambda x: x[1]) # Sort activities by end times
        i = activity_list[0]          # Current task, always perform the first task
        res = [i]                     # Result
        al = activity_list[1:]        # Skip first
        for j in al:                  # Loop across rest
            if j[0] >= i[1]:          # If j starts after i ends
                res.append(j)         # add j to task list
                i =j                  # j new current task
        return res
    
    activity_list = [(1,2),(3,4),(0,6),(5,7),(8,9),(5,9)]
    print(MaxActivities(activity_list))
    
    You can see from the example above, that had we sorted by start time, we wouldve picked the (0,6) event, leaving only the (8,9) event left after that for a maximum number of 2 events total. Nowhere near optimal! Let's talk runtime a bit - If our set of activities isn't sorted, we will need to do that for \(O(n\log n)\) time initially. But what about actually picking events in the set? Think about our "worst" case here. We'd have a list like [(1,2).(2,3),(3,4),(4,5),...] and so on. We would eventually pick every event in the set! But even so, that's stil O(n) time! Pretty quick... and a lot faster than just generating all the possible combinations of events, figuring out if its even valid, then figuring out if it's optimal from there.

    Hopefully it's pretty clear why this works - by picking the earliest ending tasks, we interfere with the fewest of the remaining tasks. In class excercise: Run the greedy algorithm by hand on the following set of activities:

    \([(3, 9), (8, 12), (5, 7), (0, 6), (8, 11), (6, 10), (1, 4), (3, 5), (2, 14), (5, 9)]\)

    Fractional Knapsack

    In the fractional knapsack problem, you're trying to pack a knapsack with a weight capacity \(W\) with stuff. All of the stuff that we want to divisible, we can take none, some, or all of it. Things like water, flour, sugar, salt, pepper (ground). Each kind of stuff has a value, and we want to maximize the value of stuff in the knapsack. For example:

    Suppose you have a knapsack with a weight capacity of 7 kg and the following items available:

    Item Weight (kg) Value (Utility Score) Value-to-Weight Ratio
    Water 5.0 30 6.0
    Flour 2.0 10 5.0
    Sugar 1.5 8 5.33
    Salt 0.5 4 8.0
    Ground Pepper 0.1 2 20.0
    Notice that using the value and the weight, we generate the value to weight ratio, or the value per kg.

    The greedy algorithm is to simply to pack as much as you can of the remaining stuff that has the highest value to weight ratio:

    1. Start out with the best stuff, the pepper. We pack all of that, so we have value of 2, and a weight remaining of 6.9
    2. Of the remaining stuff, salt is next. We pack all of that, so we have a value of \(4+2=6\) and a weight remaining of 6.4
    3. Next up is water, so we pack all of that. We get a value of 36, and the weight remaining is 1.4
    4. Next is sugar. We'd like to pack all of it, but we only have 1.4 kg left, so we pack 1.4 kg of sugar for a final value of \(36+1.4\times 8 = 47.2\)
    5. and we have no room for any flour

    Thus, the maximum achievable utility in the knapsack with these items is 47.2.

    There are many examples of greedy algorithms. All greedy algorithms have the same structure. The greedy choice must be part of the optimal solution, and the optimal solution of the remaining part of the problem is also part of the optimal solution. Not all problems have a greedy solution....

    0-1 Knapsack Problem

    1. Let's look at a different version of the knapsack problem. This is called the 0-1 knapsack problem, where it is just like the fractional knapsack problem but all of the items you have are discrete. That is, you can only add the item or not add the item.
    2. You might think we could just adapt the greedy algorithm from the fractional version, but it doesn't work! Take the following set up with a weight capacity of 10:
      Item Weight (wi) Value (vi) Value-to-Weight Ratio (vi/wi)
      1 7 15 2.14
      2 5 10 2.0
      3 4 8 2.0
    3. with a value to weight ratio of 2.14, item 1 gets chosen first. Unfortunately, adding items 2 or 3 would exceed the limit, so we end up with a value of 15. If we had selected items 2 and 3, we would but under the limit and have a value of 18.
    4. Greedy algorithms are great when they work, but frequently lead to non-optimal answers.