SI335: Basic iterative sorting: insertionSort & selectionSort

selectionSort

SelectionSort is one of the classic sorting algorithms. Here's a nice simple iterative version of it:

selectionSort(A,n)
input:  A an array of n objects for which "<" is defined
        n the size of A
Output: side effect of leaving A in sorted order according to "<"

for i from 0 to n-2 do
  m = i
  for j from i+1 to n-1 do
    if A[j] < A[m]
      m = j
  swap(A[i],A[m])

In class we showed that its worst-case running time is $O(n^2)$, which ... well, duh. More interestingly, we showed that its worst-case running time is $\Omega(n^2)$. Put the two together, and we arrive at the fact that the worst-case running time for selectionSort is $\Theta(n^2)$. What about the best case? Well you may notice that selectionSort has no way to exit either loop early. So, in fact, there is no real distinction between the best case or the worst case in terms of the number of times the innermost loop body is executed. I.e., $T_{B,selectionSort}(n) \in \Theta(n^2)$.

Note, however, that this doesn't mean that the actual amount of time a given implementation takes to sort $n$ numbers is the same for all inputs of size $n$. Because the innermost loop body takes an extra step when $A[j] \lt A[m]$, the actual time varies, but of course those variations result in at most a constant factor difference between the least and most time taken. So the growth rate is the same.

Correctness: To prove that selectionSort is correct, we want to find a good invariant for the outer loop.

Invariant for outer loop: $A[0] \leq \cdots \leq A[i]$ and $\forall a,b, 0 \leq a \lt i \leq b \lt n \Rightarrow A[a] \leq A[b]$

In class, we proved this invariant and used it to prove the algorithm correct.

insertionSort

SelectionSort is another one of the classic sorting algorithms. Here's a nice simple iterative version of it:

insertionSort(A,n)
input:  A an array of n objects for which "<" is defined
        n the size of A
Output: side effect of leaving A in sorted order according to "<"

for i from 0 to n-2 do
  j = i + 1
  while j > 0 and A[j] < A[j-1] do
    swap(A[j],A[j-1])
    j = j - 1

In class we showed that its worst-case running time is $O(n^2)$, which ... well, duh. More interestingly, we showed that its worst-case running time is $\Omega(n^2)$. This was more involved than for selectionSort. Why? Because selectionSort's inner loop always executes $n - i - 1$ times, no matter what the input looks like. With insertionSort, however, the inner loop makes anywhere from $0$ to $i$ iterations, depending on the particulars of the input, because the "A[j] < A[j-1]" condition allows us to exit the inner loop early. To do this analyis, we had to find concrete inputs for which insertionSort is slow. Someone noticed that for reverse-sorted input, the inner loop always iterates the maximum number of times (which is $i$). So we derived $\Omega(n^2)$. Once again, put the two facts together, and we arrive at the fact that the worst-case running time for insertionSort is $\Theta(n^2)$.

What about best case for insertion sort? From the structure of the algorithm, we see that the outer loop must run $n-1$ times, and the body of that loop takes $\Omega(1)$ time (i.e. at least a constant, maybe more). This means $T_{B,\text{insertionSort}}(n) \in \Omega(n)$. As always, to get an upper bound on the best case we need to analyze an example family of input. For insertion sort, life is particularly good when the input is already sorted. In that case, we exit the inner while loop immediately (i.e. after zero iterations) every time. This means we do $\Theta(n)$ work for a pre-sorted array, which gives us $T_{B,\text{insertionSort}}(n) \in O(n)$. Putting the two bounds together gives $T_{B,\text{insertionSort}}(n) \in \Theta(n)$.

A sorting algorithm is said to be "stable" if the relative order of "equal" elements after the sort is the same as before the sort. So, for example, if I was sorting by x-coordinate the list of points \[ (5,2),(7,3),(2,9),(7,0),(4,4),(2,0),(3,8),(2,5) \] a stable sorting algorithm would have to keep the points with x-corrdinate 2 in the original order $(2,9),(2,0),(2,5)$.

Insertion Sort is generally the best of the simple (but slow) sorting algorithms. Its "hidden constant" is small, compared to other simple algorithms like Selection Sort and Bubble Sort. It has other nice properties, like being stable (see side note). Other properties we might care about is that it is "in place", meaning it only requires a constant amount of memory in addition to its input; and "adaptive", meaning that it takes advantage of structure in the input (for insertion sort this means when no element is very far from its proper place in sorted order, insertion sort is fast).

Correctness: To prove that insertionSort is correct, we want to find a good invariant for the outer loop.

Invariant for outer loop: $A[0] \leq \cdots \leq A[i]$

In class, we proved this invariant and used it to prove the algorithm correct.

Reading

Homework

selectionSort

insertionSort