Stepwise Refinement From the Inside out

When you're writing a larger program, it's nice to be able to compile and run something during the intermediate stages, just to make sure what you're doing is all right.

One way to do this is to continually refine your program so that it solves more and more of your initial problem. In other words:

  1. You start with a dumbed down version of your original problem and a working program that solves that dumbed down version. Make sure your program compiles and runs for this easy problem.
  2. Then you add a little more of the complexity of the original problem and you refine your program to a working program for this slightly less dumbed down problem. Make sure your program compiles and runs at this stage.
  3. Repeat step 2 until your program becomes a solution to the full problem.

Example: Printing all the towns in Census data

Consider the following problem:
Write a program that works as follows:
For example, take a look at Maryland's geographic census data.

Difficulty: This is not a completely easy problem, as we'll see, in part because towns' names are not always a single word, take "East New Market", for example.

Basic Structure of the program

So, the question is this: how do I go about designing a program that solves a non-trivial problem like this?

Well, first of all, the basic structure of the program should be clear:


int main()
{
  // Open census data file
  
  // Read in (and ignore!) header information

  // Loop over each row
  {    
    // Process row
  }

  return 0;
}
We'll use this as a starting point.

Working with a dumbed down version

Dumb down: Let's work with a fixed file

For the moment, let's make a file tempdata that contains our data. Eventually our program should allow the user to specify the data file, but we'll worry about that later. Let's start off easy. With this assumption, our program becomes:

int main()
{
  // Open census data file
  ifstream fin("tempdata");
  
  // Read in (and ignore!) header information

  // Loop over each row
  {    
    // Process row
  }
  return 0;
}

Dumb down: tempdata has a single row with no header

Now to get us a little further, let's try to make the problem simple by considering simple data! Let's assume that our data file has no header information, and only a single row. We'll worry about dealing with headers and multiple rows later.

For example, our file tempdata looks as follows:

24   04625          530      0.6      0.2    870.3   2254.0 Barton town
Note that there are 7 pieces of information to be read in (and ignored) before we come to the town name, "Barton". So our program becomes

int main()
{
  // Open census data file
  ifstream fin("tempdata");
  
  // Read in (and ignore!) header information

  // Loop over each row
  {    
    /***** Process row **************/
    // Read (and ignore) 7 entries
    // Read name of city/town/CDP/village
    // Print name if town
  }
  return 0;
}
Finally, we arrive at this, which is a complete working program that we can compile and test on our simplified input. It seems to work.

Add complexity: What about other towns?

We can test it on other rows like:
24   04000      736,014    209.3     80.8   3516.6   9108.0 Baltimore city
and it works like a champ. However, it fails for something like this:
24   05550        8,860      6.7      2.6   1329.1   3442.5 Bel Air town
Why does it fail? It fails because our program tacitly assumed that a town's name would be a single word, and that's not the case here.

So we need to keep reading in words, stopping when we read in town or city or CDP or village. So we might modify that piece of our code to look like this:


...
    // Read name of city/town/CDP/village
    string t;
    fin >> t;
    while(t != "town" && t != "city" && t != "CDP" && t != "village")
    {
      fin >> t;
    }

   // Print name if town
    if (t == "town")
      cout << ??? << endl;

Remember the name!

This is fine, except that when it comes time to print out the town name, we don't know what to do. We threw away the town name in the process of looking for city/town/CDP.
We can remember the town name as follows:
So now we have this, which is a complete working program that we can compile and test on our simplified input ... this time with town names of multiple words.

Add complexity: Add more rows

All right. Next let's remove our simplifying assumption that says tempdata only contains one row. In other words, lets deal with several rows of data.

The key here is going to be determining that there is no more data to read. There's a trick to this: If you read in from an ifstream object and you come to the end of the file, the ifstream object goes into an error state.

Recall
If an istream is cast to a bool, it goes to false if it's in an error state, and true otherwise. So all we have to do is place the
fin >> s
that reads in the first string of a line in the while loop test-condition, and we'll fall out of the loop when the file ends and there is no "next line" to read.
Here's a solution along these lines.

Add complexity: Add headers

Next let's lift the assumption that all this header information has been stripped out of the file. How can we read through that garbage to get to the stuff we care about? Well, notice that the last word in the header is "NAME", and that it only occurs in that one place. We can keep reading in strings until we read the string "NAME"! This is pretty simple, it looks something like this:

...
  // Read in (and ignore!) header information
  string kill;
  while( (fin >> kill) && (kill != "NAME") )
  {
     // do nothing: ignore header info
  }

Add complexity: User can specify a data file

Lastly, we need to modify our program so that the file name is input by the user, as opposed to being hard-coded into the program. This is done, for example, as follows:

...
  // User enters file name
  string file;
  cout << "Enter name of input file: ";
  cin >> file;

  // Open census data file
  ifstream fin(file.c_str());
  if (!fin)
  {
    cout << "Error opening file!" << endl;
    return 0;
  }

An finally, here is a complete program!

Problems

  1. Census Statistics - The census keeps tables of populations and population densities for all of our states. Each state has its own file giving the names of all cities, towns, and CDP's ("census designated place" - this appears to be census-eese for "other") in that state. For example, take a look at Maryland's geographic census data. Getting information from these files presents us with problems that are very representative of what a lot of programming is about.
  2. Suppose your given a list of student names and alpha codes followed by a list of grades. Read this information in, and print out the name of the student with the highest average for the given grades. The format of your input is demonstrated in this example:
    John Jeffery Jones m040101 85, 95, 22, 54;
    Sandra Smith m040201 99, 98;
    Xavier Xenon Junior m050400 78, 80, 82, 84, 86, 88, 90;
    For this input, your program would print out
    Sandra Smith had the highest average.
    A good way to approach this problem would be to write three successive programs:
    1. A program that reads in students whose name is just one word, and whose list of grades always consists of a single score. For example, input for this might look like:
      Jones m040101 85
      Smith m040201 99
      Xenon m050400 78
    2. A program that reads in students whose names may consist of multiple words, but whose list of grades always consists of a single score. For example, input for this might look like:
      John Jeffery Jones m040101 85
      Sandra Smith m040201 99
      Xavier Xenon m050400 78
      Hint: Remember that if S is an object of type string and c is an object of type char, then c + S is the string object you get by tacking the character c onto the beginning of S.
    3. A program to solve the full problem.