SI335: Finishing selection & insertion sort; Divide & Conquer

sectionSort and insertionSort are generic (Compartison-Based Sorting)

If you look at the sectionSort and insertionSort algorithms (and your loop invariants) you should see that the only assumption we make of them elements of the arrays we sort is that we have a valid "<" defined for them. What does that mean? Well, "<" must be a function that takes any two object of the type stored in $A$, and returns a true/false value. But not just any old boolean-valued function will do. It must be:

transitive — i.e. we need $a \lt b$ and $b \lt c$ implies $a \lt c$
anti-reflexive — i.e. $a \lt a$ must be false

You might be concerned that your invariants used $\leq$ rather than $\lt$. Fear not! Since $a \leq b \Leftrightarrow \neg(b \lt a)$ we can always recast the $\leq$'s in terms of $\lt$'s.

As long as the elements of $A$ have a valid $\lt$ defined, these algorithms work. Algoruithms like this are said to be generic.

Both are $\Theta(n^2)$ worst case, does that mean they take the same amount of time in the worst case?

Both sectionSort and insertionSort are $\Theta(n^2)$, does that mean they run at the exact same speed? No! We talked about how, when the asymptotic analysis yields the same results, we have to look at hidden constants to determine which will actually run faster. With smaller sized inputs, we might also have to look at lower-order terms and the thresholds implicit in our asymptotic notation.

Generalizing: the "Pick and Place" strategy

We talked about how sectionSort and insertionSort both follow a "pick and place" strategy. For insertionSort, the "pick" is trivial, and the "place" is complicated. For selectionSort, the "pick" is complicated, but the "place" is trivial. We could imagine creating new sorting algorithms by trying out new "pick" ideas and "place" ideas.

A new strategy: Divide & Conquer

One of the most generally useful strategies (or "paradigms") for efficient algorithm design is Divide & Conquer. We looked at the divide and conquer strategy in the problem of simple search in an unsorted array range. Here's what we came up with:

Algorithm: search(A,i,j,x) ← search for index in [i,...,j] at which x appears in A	

if i == j
  return A[i] == x ? i : NOTFOUND

m = (i + j)/2             ← divide

r1 = search(A,i,m,x)      ← conquer
r2 = search(A,m+1,j,x)

if r1 ≠ NOTFOUND          ← combine
  return r1
else
  return r2

To analyze the worst case running time of this algorithm, we have to express an upper bound on $T_w(n)$, where $n$ is the size of the input range, as a recurrence: \[ T_w\left(n\right) \leq T\left(\lceil n/2 \rceil\right) + T_w\left(\lfloor n/2 \rfloor\right) + O\left(1\right) \leq 2T_w\left(\left(n+1\right)/2\right) + c, \text{ for some constant c} \] A function that satisfies this recurrence with equality is an upper bound for $T_w(n)$ (in the big-O sense). $f(n) = 2f((n+1)/2) + c$ defines a function that is $O(n)$, so $T_w(n) \in O(n)$.

Note: if the array is sorted, we can always limit our recursive call to search to either just the left half or just the right half. And this, of course, gives us binary search. So Binary Search is just the application of divide and conquer to the problem of search in a sorted array.

Applying Divide & Conquer to sorting

Next we decided to try applying Divide and Conquer to the problem of sorting. This is where we got to:

Algorithm: sort(A,i,j)

if i == j
  return

m = (i + j)/2          ← divide

r1 = sort(A,i,m)       ← conquer
r2 = sort(A,m+1,j)

?????                  ← combine

So the question is, how do we efficiently combine these two sorted subarrays to form the sorted version of the original array?

Reading

Homework

sectionSort and insertionSort are generic (Compartison-Based Sorting)

Both are $\Theta(n^2)$ worst case, does that mean they take the same amount of time in the worst case?

Generalizing: the "Pick and Place" strategy

A new strategy: Divide & Conquer

Applying Divide & Conquer to sorting