Structs 3: Sorting

Recall the struct Student we saw from last lecture.

 `````` struct Student { string name; int* hw; }; ``````
Now let's sort out student records and print them out in sorted order. In particular, let's sort them by their grades on homework assignment #4, so that the student with lowest grade on assignment #4 comes first, and the student with highest grade on assignment #4 comes last.

We'll have our array `stu` of objects of type student, and we'll use the same old selection sort routine we've always been using. Now, at some point in selection sort, we'll swap two elements of `stu` and I want to take a brief moment to look at what that means. The following picture shows how `swap(stu[i],stu[j])` affects things. Hopefully you see, once again, that we're not moving the actual arrays of grades at all, we're simply moving the pointers to those arrays.

 Now, if we're going to sort this array of `Student` objects, it's just like sorting anything else! `````` void selectionsort(Student* A, int N) { for( int length = N; length > 1; length-- ) swap(A[indexOfMax(A,length)],A[length-1]); } int indexOfMax(Student* A, int N) { int imax = 0; for( int i = 1; i < N; i++ ) if( isbefore(A[imax],A[i]) ) imax = i; return imax; } void swap(Student &a, Student &b) { Student temp = a; a = b; b = temp; }`````` So, the only thing that remains is to produce a `isbefore` function that will take two `Student` objects and decide whether the first needs to come before the second. Remember we want our `Student` objects in order from lowest to highest score on homework #4. So, for `isbefore(a,b)`, I need to determine whether the homework #4 score for `a` is less than the homework #4 score for `b`. `````` bool isbefore(Student a, Student b) { if (a.hw[4] < b.hw[4]) return true; else return false; }`````` With this, it's easy to write a program that prints out students & their HW#4 scores ordered from lowest to highest HW#4 score. Here's a complete program.

Letting the user choose which grade to sort on breaks our sort scheme!

Now, a better version of the above program would allow the user to choose which homework assignment we sorted on. So, if `x` is the assignment number, we'd have to modify `isbefore` to be:
``````
bool isbefore(Student a, Student b) {
if (a.hw[x] < b.hw[x])
return true;
else
return false;
}``````
However, there's a problem! Where does the `isbefore` function get `x` from? The only way it can get `x` is if we pass it in as a parameter. So the function must look like:
``````
bool isbefore(Student a, Student b, int x) {
if (a.hw[x] < b.hw[x])
return true;
else
return false;
}``````
Now, this fixes `isbefore`, but it necessitates a change in `indexOfMax`, which is the function that calls `isbefore`.
``````
int indexOfMax(Student *A, int N) {
int imax = 0, i;
for(i = 1; i < N; i++)
if (isbefore(A[imax],A[i],x))
imax = i;
return imax;
}``````
Now `indexOfMax` has the same question --- where is it going to get `x` from? It seems that another parameter is required for `indexOfMax` as well.
``````
int indexOfMax(Student *A, int N, int x) {
int imax = 0, i;
for(i = 1; i < N; i++)
if (isbefore(A[imax],A[i],x))
imax = i;
return imax;
}``````
Similarly, `sort` is going to require an extra parameter to pass along to `indexOfMax`. Putting it all together, we get this final program. Pretty nice, eh?

I want to think about creating a struct to represent quadrilaterals, i.e. polygons with four sides. This should tie together much of what we've already done.

One thing we're going to do is reuse code. This idea is very dear to those of us who are lazy ... which definitely includes me. A quadrilateral is defined by its four vertices, which are of course points, and we've done points to death already. As a matter of fact, going through the lecture notes, I find at least all these definitions for dealing with points. And we could round this out to include subtraction of points, or I/O, or whatever else we wanted. Instead of reinventing the wheel, let's reuse this code.

Given this, I'll represent a quadrilateral by the four `point`s that are its vertices, and in the context I have in mind, I'll also give each quadrilateral a character that acts as a label. The struct for quadrilateral, which I'll call `Quad`, has the following simple definition:

``````
char label;
point *vert;
};
``````
Here's a simple `main()` using the struct. All it does is read in a `Quad` and print it back out again. Notice how I create an object of type `Quad`. It takes two steps, one actually creates the `Quad` object, and one allocates the array of four points that stores the vertices.
``````
int main() {
S.vert = new point[4];

cout << "Enter label and 4 vertices: ";
cin >> S.label;
for( int i = 0; i < 4; i++ )

// Write label and vertices
for( int j = 0; j < 4; j++ ) {
write(S.vert[j], cout);
cout << ' ';
}
cout << endl;

return 0;
}
``````
This gives us a pretty good look at how we've done things so far, and you can look at the complete program to make sure you see how all these pieces fit together.

Breaking programs into pieces

Looking at the program we just wrote, you should see that it breaks up into two obvious pieces - the old code, which is all the `point` stuff, and the new code, which is the `Quad` definition and the `main` function. Furthermore, the `point` stuff falls into two obvious pieces - the struct definitions & function prototypes, and the function definitions.
 Prototypes for `point` stuff Definitions for `point`-related functions `````` #include #include using namespace std; /********************************************* ** PROTOTYPES & STRUCT DEFINITIONS *********************************************/ struct point { double x, y; }; void read(point &p, istream &IN); void write(point p, ostream &OUT); point add(point a, point b); point div(point P, double z); `````` `````` /********************************************* ** FUNCTION DEFINITIONS *********************************************/ void read(point &p, istream &IN) { char c; IN >> c >> p.x >> c >> p.y >> c; } void write(point p, ostream &OUT) { OUT << '(' << p.x << ',' << p.y << ')'; } point add(point a, point b) { point S; S.x = a.x + b.x; S.y = a.y + b.y; return S; } point div(point P, double z) { point Q; Q.x = P.x / z; Q.y = P.y / z; return Q; } `````` New code - definition for `Quad` and `main` `````` struct Quad { char label; point *vert; }; int main() { // Create quadrilateral S Quad S; S.vert = new point[4]; // Read label and vertices cout << "Enter label and 4 vertices: "; cin >> S.label; for( int i = 0; i < 4; i++ ) read(S.vert[i], cin); // Write label and vertices cout << "Your quadrilateral is " << S.label << " "; for( int j = 0; j < 4; j++ ) { write(S.vert[j], cout); cout << ' '; } cout << endl; return 0; } ``````

It's fairly natural, then, to split our original program up into three files: `point.h`, `main.cpp`, and `point.cpp`. Now, to use the struct `point` and all of its related functions, we really only need the function prototypes and struct definition - the definitions of the functions don't really concern us as users of `point`. We only care about what's available for us to use, not how it works. So, the file `main.cpp` begins with the line:

``#include "point.h"``
which means literally pretend that the entire contents of `point.h` was typed in starting at this point. Thus, as far as the compiler is concerned, it's like that struct definition and all those prototypes were right there.

You'll notice that `point.cpp` also begins with `#include "point.h"`, since those function definitions won't make sense to the compiler without the definition of the struct `point` and the prototypes. So, your compiler views your program as consisting of two files, `main.cpp` and `point.cpp`, both of which use `point.h`. It's perfectly willing to compile `point.cpp`, saying "I believe there is a `main` function somewhere that's going to use these function definitions. And the compiler is perfectly willing to compile `main.cpp` saying "I believe there are some .cpp files somewhere that define these functions that `main` uses and whose prototypes are given in the included file `point.h`. Then, when the program is linked, these two compiled pieces are brought together (along with any standard library code we need) to form a complete program.

Breaking programs up into separate files this way doesn't really change what you can or can't do, but it makes it much easier to organize your code, and it makes it much easier to reuse structs and functions from previous programs. In fact, your start thinking of big programs as different modules to write: each module containing a `.h` file (or header file) that gives the outside world all the information it needs to use the module, and a `.cpp` file that contains the source code that implements the module (i.e. that tells the compiler how all these functions really work). What you saw up above was a very simple `point` module. One key thing that's missing from my simple point module, however, is documentation. The header file needs a lot of documentation so that the user of the module understands how to use the struct and the various functions the module offers. They don't need to know how it works, but they do need to know what it does!

You can compile the source files as follows:
```g++ main.cpp point.cpp
```
The above command will create an executable program `a.out`.

Problems

1. Look at this program from last lecture. The program uses a `point` struct and a struct `hhmmss`. Break the program up into separate `point` and `hhmmss` modules, and of course a main file. We have the following pieces:
Of course, you split off a third module for the struct `datum`, but it seems less likely to be reused, so there's a less compelling reason for doing it.