Another look at sorting

We will use the same data file namedgrades.txt. Now let's sort out student records and print them out in sorted order. In particular:

Recall the struct Student we saw from last lecture.


struct Student
{
  string name;
  int* hw;
};

Now, if we're going to sort this array of Student objects, it's just like sorting anything else! We'll have our array stu of objects of type student, and we'll use the same old selection sort routine we've always been using.

void selection_sort(Student* A, int n) 
{
  for (int i = 0; i < n - 1; ++i) {
    // find nexti, the index of the next element
    int nexti = i;
    for (int j = i + 1; j < n; ++j) {
      if (before(A[j], A[nexti])) {
        nexti = j;
      }
    }
    // swap A[i] and A[nexti]
    Student temp = A[i];
    A[i] = A[nexti];
    A[nexti] = temp;
  }
}
   So, the only thing that remains is to produce a before function that will take two Student objects and decide whether the first needs to come before the second. Remember we want our Student objects in order from lowest to highest score on homework #4. So, for before(a,b), I need to determine whether the homework #4 score for a is less than the homework #4 score for b.

bool before(Student a, Student b)
{
  return a.hw[4] < b.hw[4];
}
With this, it's easy to write a program that prints out students & their HW#4 scores ordered from lowest to highest HW#4 score. Here's a complete program.

Now, at some point in selection sort, we'll swap two elements of stu and I want to take a brief moment to look at what that means.

The picture on the left shows how swap(stu[i],stu[j]) affects things. Hopefully you see, once again, that we're not moving the actual arrays of grades at all, but we're simply moving the pointers to those arrays.

Letting the user choose which grade to sort

Now, a better version of the above program would allow the user to choose which homework assignment we sorted on. So, if x is the assignment number, we'd have to modify before to be:

bool before(Student a, Student b)
{
  return a.hw[x] < b.hw[x]; // we should sort based on grade x
}

Q. There's a problem in the above code. What is it?

A: Where does the function get x from?  
The only way it can get x is if we pass it in as a parameter. So the function must look like:

bool before(Student a, Student b, int x)   // int x is added to fix the problem
{
  return a.hw[x] < b.hw[x];
}
Now, this fixes before, but it necessitates a change in selection_sort, which is the function that calls before.

Q. There's one more problem to fix in selection_sort. What is it?

A: Again, where does the function get x from?  
It seems that another parameter is required for selection_sort as well.

void selection_sort(Student* A, int n, int x) // *** change here 
{
  for (int i = 0; i < n - 1; ++i) {
    // find nexti, the index of the next element
    int nexti = i;
    for (int j = i + 1; j < n; ++j) {
      if (before(A[j], A[nexti], x)) {    // *** change here
        nexti = j;
      }
    }
    // swap A[i] and A[nexti]
    Student temp = A[i];
    A[i] = A[nexti];
    A[nexti] = temp;
  }
}
Putting it all together, we get this final program. Pretty nice, eh?

Struct Quadrilateral

I want to think about creating a struct to represent quadrilaterals, i.e. polygons with four sides. This should tie together much of what we've already done.

One thing we're going to do is reuse code. This idea is very dear to those of us who are lazy .... A quadrilateral is defined by its four vertices, which are of course points, and we've done quite a lot with points already. As a matter of fact, we used most of these definitions for dealing with points. And we could round this out to include subtraction of points, or I/O, or whatever else we wanted. Instead of reinventing the wheel, let's reuse this code.

Given this, we'll represent a quadrilateral by the four points that are its vertices, and in the context I have in mind, we'll also give each quadrilateral a character that acts as a label. The struct for quadrilateral, which we'll call Quad, has the following simple definition:


struct Quad
{
  char label;
  point* vert;
};
Here's a simple main() using the struct. All it does is read in a Quad and print it back out again. Notice how we create an object of type Quad. It takes two steps, one actually creates the Quad object, and one allocates the array of four points that stores the vertices.

int main()
{
  // Create quadrilateral S
  Quad S;
  S.vert = new point[4];

  // Read label and vertices
  cout << "Enter label and 4 vertices: ";
  cin >> S.label;
  for(int j = 0; j < 4; j++)
    S.vert[j] = readpoint(&cin);

  // Write label and vertices
  cout << "Your quadrilateral is " << S.label << " ";
  for(int j = 0; j < 4; j++)
  {
    writepoint(S.vert[j], &cout);
    cout << ' ';
  }
  cout << endl;

  return 0;
}
This gives us a pretty good look at how we've done things so far, and you can look at the complete program to make sure you see how all these pieces fit together.

Breaking programs into pieces

Looking at the program we just wrote, you should see that it breaks up into two obvious pieces - the old code, which is all the point stuff, and the new code, which is the Quad definition and the main function. Furthermore, the point stuff falls into two obvious pieces - the struct definitions & function prototypes, and the function definitions.
point.h -- Prototypes for point stuff point.cpp -- Definitions for point-related functions

#include <iostream>
#include <fstream>
using namespace std;

/*********************************************
* * PROTOTYPES & STRUCT DEFINITIONS
* ********************************************/
struct point
{
  double x, y;
};
point readpoint(istream* pIN);
void writepoint(point p, ostream* pOUT);
point add(point a, point b);
point div(point P, double z);

#include "point.h"

/*********************************************
 ** FUNCTION DEFINITIONS
 *********************************************/
point readpoint(istream* pIN)
{
  point p;
  char c;
  (*pIN) >> c >> p.x >> c >> p.y >> c;
  return p;
}

void writepoint(point p, ostream* pOUT)
{
  (*pOUT) << '(' << p.x << ',' << p.y << ')';
}

point add(point a, point b)
{
  point S;
  S.x = a.x + b.x;
  S.y = a.y + b.y;
  return S;
}

point div(point P, double z)
{
  point Q;
  Q.x = P.x / z;
  Q.y = P.y / z;
  return Q;
}

main.cpp -- New code with definition for Quad and main

#include "point.h"

struct Quad
{
  char label;
  point* vert;
};

int main()
{
  // Create quadrilateral S
  Quad S;
  S.vert = new point[4];

  // Read label and vertices
  cout << "Enter label and 4 vertices: ";
  cin >> S.label;
  for(int j = 0; j < 4; j++)
    S.vert[j] = readpoint(&cin);

  // Write label and vertices
  cout << "Your quadrilateral is " << S.label << " ";
  for(int j = 0; j < 4; j++)
  {
    writepoint(S.vert[j], &cout);
    cout << ' ';
  }
  cout << endl;

  return 0;
}
It's fairly natural, then, to split our original program up into three files: point.h, main.cpp, and point.cpp.

#include "point.h" in main.cpp

Now, to use the struct point and all of its related functions, we really only need the function prototypes and struct definition - the definitions of the functions don't really concern us as users of point. We only care about what's available for us to use, not how it works. So, the file main.cpp begins with the line:
#include "point.h"
which means literally pretend that the entire contents of point.h was typed in starting at this point. Thus, as far as the compiler is concerned, it's like that struct definition and all those prototypes were right there.

#include "point.h" in point.cpp

You'll notice that point.cpp also begins with #include "point.h", since those function definitions won't make sense to the compiler without the definition of the struct point and the prototypes.

Benefit of breaking a program into separate files

Breaking programs up into separate files this way doesn't really change what you can or can't do, but: In fact, your start thinking of big programs as different modules to write: each module containing a .h file (or header file) that gives the outside world all the information it needs to use the module, and a .cpp file that contains the source code that implements the module (i.e. that tells the compiler how all these functions really work). What you saw up above was a very simple point module.

One key thing that's missing from my simple point module, however, is documentation. The header file needs a lot of documentation so that the user of the module understands how to use the struct and the various functions the module offers. They don't need to know how it works, but they do need to know what it does!

The most important note you'll ever read: Programming nirvana - the most enlightened state of programming - is reached when you have perfectly achieved two goals:
  1. separation of interface from implementation, i.e. what is needed in order to use a thing is completely walled off from how the thing actually works, and
  2. code reuse, i.e. code you write for one project can be easily reused in all sorts of other projects, or reused within the same project (never duplicate code!).
These two goals are independent of what language you program in ... this is what everyone wants. In the paradigm (or style) of programming we cover in this course, Procedural Programming, the primary language construct provided to support these two goals is ... functions! The prototype vs definition split, and the scoping rules for local variables within functions provide separation (perhaps imperfectly) of interface and implementation. Viewing programs as collections of functions provides code reuse, as well-designed functions from one project can be taken piecemeal and reused in other projects.

Compiling the source code files

Easy way (recommended) Use this method when you have many cpp files
You can simply compile the source files as follows:
g++ main.cpp point.cpp -o quad
The above command will create an executable program quad.
Your compiler views your program as consisting of two files, main.cpp and point.cpp, both of which use point.h.
  1. The compiler is perfectly willing to compile point.cpp, saying "I believe there is a main function somewhere that's going to use these function definitions.
    g++ -c point.cpp
    
    The above command will create an object file point.o .
  2. And the compiler is perfectly willing to compile main.cpp saying "I believe there are some .cpp files somewhere that define these functions that main uses and whose prototypes are given in the included file point.h.
    g++ -c main.cpp
    
    The above command will create an object file main.o .
  3. Linking two object files as follows:
    g++ point.o main.o -o quad
    
    Then, when the program is linked, these two compiled pieces (i.e., object files) are brought together (along with any standard library code we need) to form a complete program.

Practice Problems

  1. Look at this program from last lecture. The program uses a point struct and a struct hhmmss. Break the program up into separate point and hhmmss modules, and of course a main file. We have the following pieces: Of course, you split off a third module for the struct datum, but it seems less likely to be reused, so there's a less compelling reason for doing it.

Do you know...?

  1. Do you know how swapping pointers works?
  2. Do you know when and why you have to add a parameter to before and indexOfMax functions?
  3. What kind of things should point.h contain?
    A: struct definition of point, function prototypes, documentation of what functions do
  4. What should point.cpp start with?
    A: #include "point.h"
  5. The following code has an error. Fix it.
    #include <point.h>
    Fix:
    #include "point.h"
  6. What is the g++ command to compile the source files point.h, point.cpp, main.cpp?
    A: g++ main.cpp point.cpp (you don't need to include point.h)