Big O and growth rate

Actual running time is a reasonable way to measure the efficiency of a program, but it is less godd at evaluating an algorithm orthoganal to the actual implementation.
Instead, we would like some sort of mathematical system for comparing algorithms. We will borrow such a mechanism from math called assymptotic analysis, or colloquially, "big O".

So, we want the algorithm that takes less time. As soon as we talk about "more" or "less", we need to count something. What should we count? Seconds (or minutes) seems an obvious choice, but that's problematic. Real time varies upon conditions (like the computer, or the specific implementation, or what else the computer is working on).

For example, suppose it's very important you compute the solution to the following equation:

$f(n,x) = \sum_{k=1}^{n}\frac{x^8}{k!}$

Here are two entirely reasonable ways of programming this same thing. The first:


   def f1(x, n):
      # Precompute x^8
      x8 = 1.0
      for i in range(8):
         x8 = x8*x
      sum = 0
      for k in range(n): #for each value of k, compute k!
         factorial = 1.0;
         for i in range(k):
            factorial = factorial*i;
       sum = sum+ x8/factorial;    # Add next term to the total sum
   return sum

   
   def f2(x, n) :
      # Compute x^8
      x8 = x*x
      x8 = x8*x8
      x8 = x8*x8

      sum = 0
      factorial = 1
      for k in range(n):
         factorial = factorial*k #Compute k!
         sum = sum + x8/factorial #Add next term
      return sum

We can use time to see how long each function takes. Here are the running times for various values of n:

`n`	`f1`	`f2`
0	0	0
1000	92	3
2000	353	6
3000	788	7
4000	1394	10
5000	2172	12

Both get slower as n increases, but algorithm 1 gets worse faster. Why does algorithm 1 suck?

What happens if we run one of them on a different computer? Will we get the same times? Probably not- processor speed, operating system, memory, and a host of other things impact how much time it takes to execute a task.

What we need, then, is a machine- and scenario-independent way to describe how much work it takes to run an algorithm. One thing that doesn't change from computer to computer is the number of basic steps in the algorithm (this is a big handwave, but generally true), so lets measure steps.

We are going to vaguely describe a "step" as a primitive operation, such as:

assignment
method calls
arithmetic operation
comparison
array index
dereferencing a pointer
returning

It is important to note that this has a great degree of flexibility: one person's basic step is another person's multiple steps. To an electrical engineer, taking the not of a boolean is one step because it involves one bit. Adding two 32-bit numbers is 32 (or more!) steps. Fortunately, we're not going to be counting quite like this for long, so we can whistle past these kinds of details.

So, when we determine how long an algorithm takes, we count the number of steps. More steps == more work == slower algorithm, regardless of the machine.

What does the following code do, and how many steps does it take?


max = a[0]                # 2 steps: array index and assignment
i = 1                     # 1 step: assignment
while i < n:              # n comparisons, plus (n-1) times through the loop
   if a[i] > max:         # 2: array and comparison
      max = a[i]          # 2, but it doesn't always happen!
   i=i+1                  # 2: addition and assignment
return max                # 1: we'll count calls and returns as 1.

This gives us: $2 + 1 + n + (n - 1)\cdot(4 \text{ or } 6) + 1$

Now, we're not quite done. In the code above, there are circumstances under which it will run faster than others. For example, if the array is in descending order, the code within the if statement is never run at all. If it is in ascending order, it will be run every time through the loop. We call these "best case" and "worst case" run times. We can also consider and talk about the "average" run time.

The best case is usually way too optimistic (most algorithms have best case times that are far away from average). Average case is often hard, because we need probabilistic analysis (what's the probability of needing the if statement?). In many cases, a worst case bound is a good estimate. That way we can always say, "no matter what, this program will take no more than this many steps." In the formula above, this means the (4 or 6) part just becomes 6.

Comparing algorithms with bounds

In the program above, the number of steps it takes depends on the size of the array, the n. What we really want is a function that takes the size of the array as input, and returns the number of steps as the output. Note that is what we developed above.

So when we're comparing algorithms, we don't compare numbers, but functions. But how do we compare functions? Which function below is "bigger"?

Well, sometimes one, and sometimes the other. So what we'll do is talk about one function dominating the other. We'll say that one function dominates another if it is always bigger than the other for every x value (for example $f(x) = x^2$ dominates $f(x)=-x^2$). Neither of the functions in the graph above dominates the other. One function can dominate another in a particular range as well. $f(x)=1$ dominates $f(x) = x^2$ in the range $[0;1]$. Now, when input sizes are small, everything is fast. Differences in run time are most relevant when there is a lot of data to process. So, we colloquially ask "When n is large, does one dominate the other?" Mathematically, we ask if there exists an $n_0\geq 1$ such that for all $n\ge n_0, f(n) \gt g(n)$ (or $g(n) \gt f(n)$).

In this graph, $f(x)=3x^2-8$ dominates $f(x)=x^2$ everywhere to the right of $x=2$. So $n_0=2$ - I like to call this the "crossover point".

How can we tell if one will dominate the other for large n?

It turns out it's really easy. It is all about growth rate. Take a look at this graph of $f(n)= n^2$ vs. $g(n) = 3n+4$.

What happens if we change the constants in $g(n)$? The line moves, but it doesn't change shape, so there's always some point at which f(n) passes it. Here for example is $f(n)= n^2$ vs. $g(n) = 30n+40$.

As soon as we start caring about really big n, it doesn't matter what the constant coefficients are, $f(x)$ will always find a point at which it will dominate $g(x)$ everywhere to the right. In the following table, $f(n)$ will dominate $g(n)$ for some large enough $n$:

$f(n)$	$g(n)$
$n^2$	$n$
$n^2-n$	$5n$
$\frac{1}{10^{10}}n^2-10^{10}n-10^{30}$	$10^{10} n + 10^{300}$

Big-O

We talk about the runtime of algorithms by categorizing them into sets of functions, based entirely on their largest growth rate.

The mathematical definition: if two functions $f(n)$ and $f'(n)$ are both eventually dominated by the same function $c\cdot g(n)$, where $c$ is any positive coefficient, then $f(n)$ and $f'(n)$ are both in the same set. We call this set $O(g(n))$ (pronounced "Big-Oh of $g(n)$"). Then we can write $f(n) \in O(g(n))$ and $(f'(n) \in O(g(n))$. Since the constant coefficients don't matter we say that $g(n)$ eventually dominates $f(n)$ if there is ANY constant we can multiply by $g(n)$ so that it eventually dominates $f(n)$. $f(n)=5n+3$ is obviously $O(n^2)$. It is also $O(n)$, since $f(n)$ is dominated by $10n$, and 10 is a constant. The formal definition: For a function $f(n)$, if there exists a $c$ and $n_0$ such that $c \gt 0$ and $n_0 \geq 1$, and for all $n \ge n_0, f(n) \leq c \cdot g(n)$, then $f(n) \in O(g(n))$. While $f(n)=5n+10 \in O(n^2)$, it is also $O(5n)$. $O(5n)$ is preferred over $O(n^2)$ because it is closer to $f(n)$. $f(n)$ is ALSO $\in O(n)$. $(O(n)$ is preferred over $O(5n)$ because the constant coefficients don't matter, so we leave them off.

Comparing Two Algorithms with Big-O

We now have the language and a basic understanding of how to compare two algorithms, let's put it to the test. Suppose we have two sorted arrays of integers, and we need to determine if an integer is in the array. Consider the two search routines over that sorted array.

def find1(L, toFind):
  for i in range(0,len(L)):
    if L[i] == toFind):
      return i
  return -1; #flag for "didn't find it"

def find2(L, toFind):
  int left = 0
  int right = len(L)

  while right-left > 1:
    mid = (left+right)/2
    if (toFind < L[mid]):
      right=mid
    else:
      left=mid
  if L[left]==toFind:
    return left

  return -1 ##flag for "didn't find it"

The question is, which is faster? You might say that find1() should be faster because it has less code. Or, you might say, we can only know if we run the routines, but we now know a better way with big-O.

The first thing we need to recognize is that we are concerned with the worst-case performance, which is what Big-O measures. If we find the items in the array, we can do so really quickly. For example, it could be the first thing we look at. So the worst case performance is when we can't find the item in the array, and thus have to search the most amount for it.

In the worst case analysis, at least for find1(), let's be verbose at first to identify the total number of steps involved. The would be:

Initialize the variable i, one step
Loop for A.length times:
- Compare i<A, one step
- Access A[i], one step
- Compare A[i] and toFind, one step
A return (one step) either in the loop or outside

If we say the length of the array is n, then we have 1+3n+1 total steps, and since we don't care about constants, that's O(n). You may notice that the driving force in the analysis is the total number of comparisons that need to be performed. When we say O(n) for the find routine, we are really saying it may take at most n comparisons to find what is being searched for.

Turning to find2(), let's now reason about the total number of comparisons this routine makes. The procedure is different here because instead of searching from front to back we are leveraging the fact that the array is sorted and making our first comparison in the middle. In the worst case, we don't find it, so we then ask: Is the item being searched for greater or less than what was just checked? If it is greater, we then search in the middle of the right half, and if it is less than, we search in the middle of the left half. This repeats until there are only 1 or 2 items left to check.

How many comparisons is that? Well, lets think of it this way, the range of items we are checking in the middle of is decreasing by a factor of 2 each time. As an example, if we have 100 items, we check once in the middle, then in the middle of 50 items, then in the middle of 25 items, then in the middle of 12 items (with rounding), then in the middle of 6 items, then in the middle of 3 items, then in the middle of 1 items. The total number of comparisons is the same as the total number of times we can divide 100 by 2 because we make one comparison per dividing the list in half.

Generalizing the problem a bit, if we have n items in the array, the total number of comparisons is the same as the number of times 2 divides into n. Put another way, that is the same as solving the following for x in the following formula:

$$ 2^x \ge n\\ $$

Or, x is the number of times we can multiple 2 together until we get over n. To solve for x, we can take the log₂ of both sides to get:

$$ log_2(2^x) \ge log_2(n)\\ x \ge log_2(n)\\ $$

The total number of comparisons requires is at least $O(log_2(n))$ The total number of comparisons requires is at least $O(log_2(n))$ and is the worst case performance of `find2()`. For reference, another name for this routine is binary search and it will come up again later in the course.

Our Main Big-O Buckets and Terminology

As we just saw, big-O is extremely useful because it allows us to put algorithms into buckets, where everything in that bucket requires about the same amount of work. It also allows us to arrange these buckets in terms of performance.

First, though, it's important that we use the correct terminology for these buckets, so we know what we're talking about. In increasing order:

Constant time: The run time of these algorithms does not depend on the input size at all. $O(1)$.
Logarithmic time: $O(\log(n))$. Remember, changing the base of the log is exactly the same as multiplying by a coefficient, so we don't tend to care too much if it's natural logarithm, base-2 logarithm, base-10 logarithm, or whatever.
Linear time: $O(n)$.
n-log-n time: $O(n \log(n))$.
Quadratic time: $O(n^2)$.
Cubic time: $O(n^3)$.
Polynomial time: $O(n^k)$, for some constant $k$
Exponential time: $O(b^n)$ for some constant $b > 1$. Note this is different than polynomial.

Fuzzily speaking, things that are polynomial time or faster, are called "fast," while things that are exponential, are "slow." There is a class of problems that are believed to be slow called the NP-complete problems. If you succeed in either (a) finding a fast algorithm to compute an NP-complete problem, or (b) prove that no such algorithm exists, congratulations! You've just solved a Millennium Problem get a million dollars, and are super-famous. If you take the algorithms class, you'll learn how to do that.

The following table compares each of these buckets with each other, for increasing values of n.

n	log(n)	n	nlog(n)	n²	n³	2ⁿ
4	2	4	8	16	64	16
8	3	8	24	64	512	256
16	4	16	64	256	4096	65536
32	5	32	160	1024	32768	4294967296
64	6	64	384	4096	262144	1.8x10^19
128	7	128	896	16384	2097152	3.4x10^38
256	8	256	2048	65536	16777216	1.2x10^77
512	9	512	4608	262144	134217728	1.3x10^154
1024	10	1024	10240	1048576	1073741824	1.8x10^308

To understand just how painful exponential time algorithms are, note that the number of seconds in the age of the universe has about 18 digits, and the number of atoms in the universe is a number with about 81 digits. That last number in the table has 309 digits. If we could somehow turn every atom in the universe into a little computer, and run all those computers in parallel from the beginning of time, an algorithm that takes 2ⁿ steps would still not complete a problem of size n=1000.

Analyzing Selection Sort

Lets consider one more routine for analysis, say selection sort, something you should remember from your intro to computing class.

def selection_sort(A):
    # Iterate through the elements of the list
    for i in range(len(A)):
        # Find the minimum element in the unsorted part of the list
        min_index = i
        for j in range(i + 1, len(A)):
            if A[j] < A[min_index]:
                min_index = j
        
        # Swap the minimum element with the first unsorted element
        A[i], A[min_index] = A[min_index], A[i]

Starting with the worst-case analysis, using our big-O analysis tools, we need to identify the dominant step, which in this case is going to be how many comparisons must be made to ensure that all the right elements are swapped to get a sorted ordering. To make the math easier, without generalization, lets say that the array has $n$ elements in it. Then the number of comparisons we must make

$$ \sum^{n}_{i=0} n-i $$

That is, the first time through the inner loop, we do n comparisons, then n-1, then n-2, and so on until we do 2, and then finally 1. If we reverse the order, of the summation, say add 1, 2, ..., n-1, n, then we can write the summation more simply like so:

$$ \sum^{n}_{i=1} i = \frac{n(n+1)}{2} $$

The solution to the sum of the numbers between 1 and n is known to be $\frac{n(n+1)}{2}$. To quickly give you the intuition why this is so, let's write out the whole summation:

\[n + (n-1) + (n-2) + ... + 3 + 2 + 1\]

Clearly, there are n of these. Let's just rearrange them, pairing the first summand with the last, the second summand with the second-to-last, etc:

\[(n+1) + (n-1+2) + (n-2+3) + ... + (n/2+n/2+1)\]

Each of these individually is equal to n+1. And, because there were n of them, and we doubled them up, there iss now n/2 of them. So, the total sum is (n/2)(n+1), making this O(n²).

Lets consider now a best case analysis of selection sort. How fast could selection sort possibly run? It turns out that it is still O(n²). No matter the input length n, it will take O(n²) comparison steps because the routine can't be sure if it is sorted ahead of time.

Analyzing Insertion Sort

Next, we will do the same thing with Insertion Sort.

Algorithm Overview

First, let us look at the algoithm itself. The idea behind insertion sort is that, like selection sort, it divides the array in two parts. One part is all sorted, and the other part is net yet sorted. In each step, we take the next number from the unsortedb part and insert it into the correct place in the sorted part...hence the name.


def InsertionSort(A):
    for i in range(1, len(A)):        #loop through array
        key = A[i]                    #grab the next element from the unsorted portion
        j = i - 1                     #counter for the insertion
        
        while j >= 0 and A[j] > key:  #loop through the sorted part finding the right spot
            A[j + 1] = A[j]           #shift elements to the right
            j = j - 1                 
        A[j + 1] = key                #Found the spot, put it in

The algorithm iterates through the array A starting from the second element (index 1). For each element, it compares it with the elements in the sorted subarray to its left and inserts it at the appropriate position.

Time Complexity Analysis

To analyze the time complexity of Insertion Sort, we consider the number of comparisons and swaps performed by the algorithm. That is, the number of times we go through the while loop will determine the runtime of the algorithm. In the worst case scenario, where the input array is in descending order, the algorithm will perform the maximum number of comparisons and swaps.

The outer loop of the Insertion Sort algorithm runs for n-1 iterations, where n is the length of the array. This is because the last element is already considered sorted when there are no more elements to its right.

In each iteration of the outer loop, the algorithm performs comparisons and swaps in the inner loop. In the worst case scenario, where the element being inserted is smaller than all the elements in the sorted subarray, the inner loop will run until the beginning of the array.

Lets denote the number of comparisons and swaps in the worst case as C(n) and S(n), respectively. The time complexity of Insertion Sort can be expressed as the sum of the number of comparisons and swaps:

T(n) = C(n) + S(n)

Number of Comparisons

In each iteration of the outer loop, the algorithm performs comparisons in the inner loop until it finds the correct position for the current element or reaches the beginning of the array. So, that is 1 compariso the first time, two the second, etc all the way up to $n-1$ the last time:

\[ C(n) = 1 + 2 + 3 + ... + (n-1)\\ = (n-1) * \frac{n}{2}\\ = O(n^2) \]

Number of Swaps

In the worst case, where the input array is in descending order, the inner loop will swap the current element with each element in the sorted subarray until it finds the correct position. The number of swaps can be calculated as follows:

\[S(n) = 1 + 2 + 3 + ... + (n-1)\\ = (n-1) * \frac{n}{2}\\ = O(n^2) \]

And we know that $O(n^2) + O(n^2) \in O(n^2)$

\(f(n)\)	\(g(n)\)
\(n^2\)	\(n\)
\(n^2-n\)	\(5n\)
\(\frac{1}{10^{10}}n^2-10^{10}n-10^{30}\)	\(10^{10} n + 10^{300}\)