SI204 Class 28: Sorting and Searching II

Reading

None.

 

Lecture

Here is an interesting link to help you visualize what happens during different sorting algorithms (sorting algorithms)

Comparison Functions and Sorting (selection sort code)

It is important to note that there is no one God-given "sorted order". We get different orders depending on how we compare values. To make this a little more explicit, let's imagine our maxindex and selectionsort functions this way:

int maxindex(int *A, int N)
{
  int imax = 0, i;
  for(i = 1; i < N; i++)
    if (before(A[imax],A[i]))
      imax = i;
  return imax;
}
void selectionsort(int *A, int N)
{
  for(int length = N; length > 1; length--)
    swap(A[maxindex(A,length)],A[length-1]);
}

... where the comparison is now done by a function before rather than A[imax] < A[i]. Now, if before is defined like this:

bool before(int a, int b)
{
  return a < b;
}

it tells you that you want a to come before b if a is smaller, i.e. we've got our same old smallest-to-largest sorted order. If, on the other hand, before is defined like this:

bool before(int a, int b)
{
  return a > b;
}

the we'll get largest-to-smallest sorted order. Think of before as a comes before predicate. I use before to tell selection sort how to determine if one value should come before another element. Perhaps I want to sort numbers by increasing absolute value, with the negative number coming first if we have opposite pairs in the array. I define before like this:

bool before(int a, int b)
{
  if (a != -b)
    return abs(a) < abs(b);
  else if (a <= 0)
    return true;
  else
    return false;
}

Now we get an ordering by smallest-to-largest absolute values, with ties (i.e. elements with equal absolute value and opposite signs) putting the positives to the back. For example,

45  32  -12  -32 0 -18 6    

would be "sorted" using this before function as:

0  6  -12  -18 -32 32 45

This is an important concept. The sorted order produced by a sorting algorithm like Selection Sort depends on the definition of your before function - the same algorithm works for any comparison function! In fact, when we get more advanced, we'll even pass the before function as a parameter to selection sort! Of course, we don't know how to do that yet!

Note: Most places you see selection sort they don't break maxindex and swap out into their own functions. Instead, they incorporate the whole thing inside the loop, like this:

void selectionsort(int *A, int N)
{
  for(int length = N; length > 1; length--)
  {
    // Find imax, the index of the largest
    int imax = 0, i;
    for(i = 1; i < length; i++)
      if (before(A[imax],A[i]))
  imax = i;
    // Swap A[imax] & the last element
    int temp = A[imax];
    A[imax] = A[length - 1];
    A[length - 1] = temp;
  }
}

 

The Same Sorting Algorithm Works on Arrays of Any Type!

We might decide that we want to sort arrays of objects of type string rather than int. What has to change in our selection sort algorithm? Not much! As you can see from either this program (separate functions) or this program (all in one), nothing really changes except the names of the types we're working with!

Now, by way of exercise, consider this problem: When I input strings with capital letters, capitals are counted as coming before lower-case. For example, here's a run of my program:

Enter number of strings: 4
Enter strings: Xena adroit lefty xylaphone
Xena
adroit
lefty
xylaphone

Quite possibly, this is not what I want. How can I change the program so that capitalization is not taken into account when sorting, although when I print the strings they should have their original capitalization? To make it simple, let's assume that the only possible capitals would be the first letter. Hopefully you see by now that all this requires is changing the function before.

 

old version of before

new version of before

bool before(string a, string b)
{
  return a < b;
}
bool before(string a, string b)
{
  // Remember we pass by value!
  if ('A' <= a[0] && a[0] <= 'Z')
    a[0] = a[0] + ('a' - 'A');
  if ('A' <= b[0] && b[0] <= 'Z')
    b[0] = b[0] + ('a' - 'A');
  return a < b;
}

What we did here was to simply make the 1st letter lower-case if it was capital for strings a and b. Note that this is okay because a and b are passed by value. If they were passed by reference, we wouldn't be able to print out with capitalization. Using this version of the program, we get:

Enter number of strings: 4
Enter strings: Xena adroit lefty xylaphone
adroit
lefty
Xena
xylaphone

 

Sorting Arrays of Arrays, and what "swap" does in this case!

Since we can sort arrays of any type of object, how about sorting arrays of arrays? For example, let's say that we have a data file like data.txt, where each row consists of three numbers defining a 3-dimensional vector. I want to read in the values in a such a data file and print out the vectors in increasing order of magnitude (i.e. sqrt(x^2 + y^2 + z^2)). Hopefully you see that this doesn't require too many changes to the above program. One key thing to see however, is what swap(a,b) does when a and b are pointers. Hopefully this picture will give you an idea:

The pointers change, but the blocks of memory that really constitute the arrays remain unchanged!

 

Searching using match

As with before in sorting, we may imagine different ways in which elements are considered to match. The above search, for instance, does not consider two strings equal if they differ because one is in capitals and the other in lower case. We can address this in the same way as before, by using a function match instead of == to determine whether two objects match:

int search(string *A, int N, string x)
{
  int i = 0;
  while(i < N && !match(A[i],x))
    i++;
  return i;
}
    

... and then changing the match function to correspond to different ideas of what it means for two objects to match. Meanwhile, the search function is always the same! [Note how short-circuit evaluation is an integral part of search. Without it, the function would not work!]

 

Problems

1.      Write a program that reads in a list of 10 names (first name followed by last name) and prints them out in the usual order - i.e. alphabetically by last name, using first names to break ties. Use a before function.


Assoc Prof Christopher Brown

Last modified by LT M. Johnson 10/23/2007 08:59:49 AM