SI335: Sorting Revisited

Reading

These notes and Unit 7 section 7.

Quiz questions

P3 from Week 14 quiz questions (done in clas for those that were there, otherwise do at home)

Sorting without comparisons: Counting Sort

Suppose I have an array of $n$ numbers, each of which is in the range $0,\ldots,k-1$. For this discussion, we'll take $k$ to be less than $n$, though that's not strictly necessary, and we'll think of these $k$ values as categories.

Algorithm: Counting Sort(A,n,k)
Inputs: A,n,k where A is an array of n values in the range 0..k-1
Output: B an array containing the elements of A in sorted order (least to greatest)

// count the number of occurences of each value, i.e. C[i] = the number of i's in A
C = an array of n zeros
for i from 0 to n-1 do
  C[A[i]]++

// set elements of C s.t. C[i] is the number of values in A that are ≤ i
for i from 1 to k-1 do
  C[i] += C[i-1]

// place each element of A into B in its sorted-order position
B = an empty array of size n
for i from n-1 down to 0 do
  C[i]--
  B[C[i]] = A[i]

return B

This clearly runs in linear time! So have we proved that our "proof" of the $\Omega(n \lg n)$ worst-case bound on sorting was a lie?!?!?! No! This sorting algorithm is not based on comparisons, so that result doesn't apply.

Counting Sort does apply more generally than just to integers. Any categorical data that we can give integer labels to (e.g. characters) works. Moreover, often we are sorting large objects, but using some integer value as a "key" for sorting purposes (e.g. sorting mids by alpha code). In project 2, we saw a nice example of this. There we objects that consisted of row,col positions. We (or at least I) wanted to sort these objects by column value. Counting Sort allows us to do this sorting in time linear in $\max(n,k)$, which for this discussion is just $\Theta(n)$.