Sorting and Searching I

In this lecture, we consider sorting numbers in an array.

Selection Sort

One natural way of sorting numbers is as follows:

The first iteration (with i==0) puts the smallest number at index 0 in the array
The second iteration (with i==1) puts the next-smallest number at index 1 in the array
The third iteration (with i==2) puts the next-smallest number at index 2 in the array
… and on until the entire thing is sorted!

This sorting algorithm is known as Selection Sort, because it selects the first of the remaining elements and puts it in its proper spot in the sorted array.

I hope the following picture clarifies the idea behind Selection Sort algorithm.

data = 7, 4, 1, 3
iteration 0

sorted up to index 0
iteration 1

sorted up to index 1
iteration 2

sorted up to index 2
(actually, all sorted)

 #        
 #
 #        put 
 # #      the smallest
 # #   #  at index 0
 # #   #   
 # # # #  (swap 2 -> 0)
---------
 0 1 2 3

     #    
     #
     #    put 
   # #    the next smallest
   # # #  at index 1  
   # # #
 # # # # (swap 3 -> 1)
---------
 0 1 2 3

     #    
     #   
     #    put 
     # #  the next smallest
   # # #  at index 2
   # # #
 # # # # (swap 3 -> 2)
---------
 0 1 2 3

       #
       #
       #
     # #
   # # #
   # # #
 # # # #
---------
 0 1 2 3

Let's write code for Selection Sort using step-wise refinement:

Function prototype

The input should an array of numbers (along with its size). As for output, the algorithm sorts the numbers inside the given array by using swaps, and so the function should output nothing.


// prototype
void selection_sort(int* A, int n);

How many iterations?

We need n-1 iterations. As you see in the picture above, after n-1 iteration, Selection Sort assures that n-1 smallest numbers are placed in order, which actually means that all numbers are in order.

What to do for each iteration?


void selection_sort(int* A, int n) {
  for (int i = 0; i < n-1; ++i) {
    // find the index of the next smallest element
    // have variable nexti contain that index
    
    // put A[nexti] at index i
  } 
}

Find the index of the next smallest element, which we call nexti
Put A[nexti] at index i.

Putting `A[nexti]` at index `i`

Let's first take care of this easier task. It is sufficient to simply swap A[nexti] and A[i].


// put A[nexti] at index i
// that is, swap A[nexti] and A[i]
int temp = A[i];
A[i] = A[nexti];
A[nexti] = temp;

Finding `nexti`


// A[nexti]: the smallest of A[i],...,A[n-1]
int nexti = i;
for (int j = i + 1; j < n; ++j) 
{
  if (before(A[j], A[nexti])) 
    nexti = j;
}

We observe the following:

At iteration i, A[nexti] is the smallest of A[i], ..., A[n-1].

Check the picture above so the observation should make sense to you.

Function `before()?`

In the above, we introduce a new function called before(). In fact, following chunks of code are equivalent in our context.

direct	`// A[nexti] should be the smallest if ( A[j] < A[nexti] ) nexti = j;`
using before()	`// A[nexti] should be the smallest if (before(A[j], A[nexti])) nexti = j;`

That is, before(a, b) is a boolean function that returns true if a should be smaller than b.


bool before(int a, int b){
  return a < b;
}

Why before() function, though? At this moment, it may not be clear to you why we bother to make a function for a single line of code. Things will be clear soon. Stay tuned!

Copy and paste this code for your HW and Lab

This is the version we recommend you copy and use in your own programs as a starting point.


void selection_sort(int* A, int n) {
  for (int i = 0; i < n - 1; ++i) {
    // find nexti, the index of the next element
    int nexti = i;
    for (int j = i + 1; j < n; ++j) {
      if (before(A[j], A[nexti])) {
        nexti = j;
      }
    }
    // swap A[i] and A[nexti]
    int temp = A[i];
    A[i] = A[nexti];
    A[nexti] = temp;
  }
}

Writing a simple before() function

Not recommended	Recommended
`bool before(int a, int b) { if( a < b ) return true; else return false; }`	`bool before(int a, int b) { return a < b; }`

Note the two versions are equivalent.

If a is smaller than b, then expression a < b evaluates true, so return a < b will return true.
If a is larger than or equal to b, then expression a < b evaluates false, so return a < b will return false.

It is always better to write simple code, and we strongly recommend to use the simpler version (on the right).

Sorting numbers in different orders

Sometimes, we may want to sort the data in a different way.

descending order

If before is defined like this, we'll get largest-to-smallest sorted order:


bool before(int a, int b)
{
  return a > b;
}

Think of before as a comes before predicate. I use before to tell selection sort how to determine if one value should come before another element.

From a bigger designing perspective, by having a dedicated function (i.e., before()) that controls the sorting order, your code gains more flexibility to deal (with much less errors) with a variety of sorting situations.

Another example

Perhaps I want to sort numbers by smallest-to-largest absolute values, with ties putting the negatives coming first. For example,

45  32  -12  -32 0 -18 6

would be "sorted" using this before function as:

0  6  -12  -18 -32 32 45

(-32 and 32 were ties, so the negative -32 should be come first)

I define before like this:


bool before(int a, int b)
{
  if ( abs(a) != abs(b) )     // if a and b are not ties (having different absolute values)
    return abs(a) < abs(b);   //   then compare them by the absolute value
  else                        // else (i.e., if they are ties)
    return a < b;             //   then the smaller (negative) should be before the larger (positive)
}

This is an important concept. The sorted order produced by a sorting algorithm like Selection Sort depends on the definition of your before function - the same algorithm works for any comparison function!

The Same Sorting Algorithm Works on Arrays of Any Type!

We might decide that we want to sort arrays of objects of type string rather than int. What has to change in our selection sort algorithm? Not much!


void selection_sort(string* A, int n) {  // *** change here: int to string
  for (int i = 0; i < n - 1; ++i) {
    // find nexti, the index of the next element
    int nexti = i;
    for (int j = i + 1; j < n; ++j) {
      if (before(A[j], A[nexti])) {
        nexti = j;
      }
    }
    // swap A[i] and A[nexti]
    string temp = A[i];                    // *** change here: int to string
    A[i] = A[nexti];
    A[nexti] = temp;
  }
}

Here's a run of my program:

before() function	sample run
`bool before(string a, string b) { return a < b; }`	Enter number of strings: 4 Enter strings: Xena adroit lefty xylaphone Xena adroit lefty xylaphone

Note that the sorting has been performed in the case-sensitive manner. That is, Xena is before adroit, since the ASCII code for 'X' is smaller than the ASCII code for 'a'.

Q: How can I change the program to sort strings in the case-insensitive manner?

Hopefully you see by now that all this requires is changing the function before.

before() function() sample run


string cap(string s)
{
  for(int i=0; i < s.length(); i++)
    if (s[i] >= 'a' && s[i] <= 'z')
      s[i] = s[i] -'a'+'A';
  return s;
}

bool before(string a, string b)
{
  return cap(a) < cap(b);
}

Enter number of strings: 4
Enter strings: Xena adroit lefty xylaphone
adroit
lefty
Xena
xylaphone

Search

Searching for values in arrays is another fundamental operation. The basic format is as follows:

int i = search(A,N,x);

That is,

Input: We search for value x in the array A of N elements.
Output: The function returns the index of an element of A that matches the value x. If no such element is found, an index of N may be returned and, since it is not a valid index, the caller of the function can determine that no match was found.

For example, to search in an array of objects of type string, I'd define the following function:


int search(string* A, int N, string x)
{
  for(int i=0; i < N; i++)
  {
    if( A[i] == x )
      return i;
  }
  return N;
}

As with before in sorting, we may imagine different ways in which elements are considered to match. The above search, for instance, does not consider two strings equal if they differ if one is in capitals and the other in lower case. We can address in the same way as before, by using a function match instead of == to determine whether two objects match:


int search(string* A, int N, string x)
{
  for(int i=0; i < N; i++)
  {
    if( match(A[i], x) )
      return i;
  }
  return N;
}

... and then changing the match function to correspond to different ideas of what it means for two objects to match. Meanwhile, the search function is always the same!

Tip: It may not seem natural or obvious to return an int equal to the size of the array when a search fails.

Why not -1? Why not size-of-the-array + 1?

Certainly both of those work, as do many other options.

However, experience has shown that returning the array size on fail is a nice way to do it and this approach has become pretty standard. The fact that it is pretty standard is really the most compelling argument of all. In the absence of compelling reasons to do otherwise, shouldn't we write our code to look like what people expect and are used to seeing?

Practice Problems

Receive 7 integers from the terminal and print them with positive numbers first (in ascending order). For simplicity, assume that the user never gives 0 as input. Below is a sample run.

~$ ./a.out
1 -2 -5 3 7 -4 5
1 3 5 7 -5 -4 -2

Answer:

bool before(int a, int b)
{
  if( a > 0 && b < 0 )         // positve(a) should be before negative(b)
    return true; 
  else if ( a < 0 && b > 0 )   // negative(a) should not be before positive(b)
    return false;
  else                         // same sign: just compare the values
    return a < b;           
}

Write a program that reads in a list of non-negative integers from the user and sorts them by their last digit. Numbers with the same last digit should appear smallest to largest. So, 678 32 67 102 7 18 would appear as 32 102 7 67 18 678.