Reading
These notes and
closely review
Unit 2 Section 2.
sectionSort and insertionSort are generic (Compartison-Based Sorting)
If you look at the sectionSort and insertionSort algorithms
(and your loop invariants) you should see that the only
assumption we make of them elements of the arrays we sort is
that we have a valid "<" defined for them. What does that
mean? Well, "<" must be a function that takes any two object of
the type stored in $A$, and returns a true/false value. But not
just any old boolean-valued function will do. It must be:
- transitive — i.e. we need $a \lt b$ and $b \lt c$
implies $a \lt c$
- anti-reflexive — i.e. $a \lt a$ must be false
You might be concerned that your invariants used $\leq$ rather
than $\lt$. Fear not! Since $a \leq b \Leftrightarrow \neg(b \lt a)$
we can always recast the $\leq$'s in terms of $\lt$'s.
As long as the elements of $A$ have a valid $\lt$ defined, these
algorithms work. Algoruithms like this are said to
be
generic.
Both are $\Theta(n^2)$ worst case, does that mean they take
the same amount of time in the worst case?
Both sectionSort and insertionSort are $\Theta(n^2)$, does that
mean they run at the exact same speed? No! We talked about
how, when the asymptotic analysis yields the same results, we
have to look at hidden constants to determine which will
actually run faster. With smaller sized inputs, we might also
have to look at lower-order terms and the thresholds implicit in
our asymptotic notation.
Generalizing: the "Pick and Place" strategy
We talked about how sectionSort and insertionSort both follow a
"pick and place" strategy. For insertionSort, the "pick" is
trivial, and the "place" is complicated. For selectionSort, the
"pick" is complicated, but the "place" is trivial. We could
imagine creating new sorting algorithms by trying out new
"pick" ideas and "place" ideas.
A new strategy: Divide & Conquer
One of the most generally useful strategies (or "paradigms") for
efficient algorithm design is
Divide & Conquer. We
looked at the divide and conquer strategy in the problem of
simple search in an unsorted array range. Here's what we came
up with:
Algorithm: search(A,i,j,x) ← search for index in [i,...,j] at which x appears in A
if i == j
return A[i] == x ? i : NOTFOUND
m = (i + j)/2 ← divide
r1 = search(A,i,m,x) ← conquer
r2 = search(A,m+1,j,x)
if r1 ≠ NOTFOUND ← combine
return r1
else
return r2
To analyze the worst case running time of this algorithm, we
have to express an upper bound on $T_w(n)$, where $n$ is the
size of the input range, as a recurrence:
\[
T_w\left(n\right) \leq T\left(\lceil n/2 \rceil\right) + T_w\left(\lfloor n/2 \rfloor\right) + O\left(1\right)
\leq 2T_w\left(\left(n+1\right)/2\right) + c, \text{ for some constant c}
\]
A function that satisfies this recurrence with equality is an
upper bound for $T_w(n)$ (in the big-O sense).
$f(n) = 2f((n+1)/2) + c$ defines a function that is $O(n)$, so
$T_w(n) \in O(n)$.
Note: if the array is sorted, we can always limit our
recursive call to search to either just the left half or just
the right half. And this, of course, gives us binary search.
So Binary Search is just the application of divide and conquer
to the problem of search in a sorted array.
Applying Divide & Conquer to sorting
Next we decided to try applying Divide and Conquer to the
problem of sorting. This is where we got to:
Algorithm: sort(A,i,j)
if i == j
return
m = (i + j)/2 ← divide
r1 = sort(A,i,m) ← conquer
r2 = sort(A,m+1,j)
????? ← combine
So the question is, how do we efficiently combine these two
sorted subarrays to form the sorted version of the original array?