SI204

Class 14: Nested Loops & Stepwise Refinement II

Reading

None

 

Lecture

Stepwise Refinement From the Inside out

When you're writing a larger program, it's nice to be able to compile and run something during the intermediate stages, just to make sure what you're doing is all right. One way to do this is to continually refine your program so that it solves more and more of your initial problem. In other words, you start with a dumbed down version of your original problem and a working program that solves that dumbed down version. Then you add a little more of the complexity of the original problem and you refine your program to a working program for this slightly less dumbed down problem. This way, you gradually work towards a solution to the full problem.

Consider the following problem: I want to write a program that will read in a file from the Census tables of populations and population densities and simply print out a list of all the towns in the state. (For example, take a look at Maryland's geographic census data.) This is not a completely easy problem, as we'll see, in part because towns' names are not always a single word, take "East New Market", for example. So, the question is this: how do I go about designing a program that solves a non-trivial problem like this? Well, first of all, the basic structure of the program should be clear:

int main()
{
  // Open census data file
  
  // Read in (and ignore!) header information
 
  // Loop over each row
  {    
    // Process row
  }
 
  return 0;
}

We'll use this as a starting point. For the moment, let's make a file "tempdata" that contains our data. Eventually our program should allow the user to specify the data file, but we'll worry about that later. Let's start off easy. With this assumption, our program becomes:

int main()
{
  // Open census data file
  ifstream fin("tempdata");
  
  // Read in (and ignore!) header information
 
  // Loop over each row
  {    
    // Process row
  }
 
  return 0;
}

Now to get us a little further, let's try to make the problem simple by considering simple data! Let's assume that our data file has no header information, and only a single row. We'll worry about dealing with headers and multiple rows later. Looking at a single row of data like

24   04625          530      0.6      0.2    870.3   2254.0 Barton town

we see that there are 7 pieces of information to be read in (and ignored) before we come to the town name, "Barton". So our program becomes

int main()
{
  // Open census data file
  ifstream fin("tempdata");
  
  // Read in (and ignore!) header information
 
  // Loop over each row
  {    
    /***** Process row **************/
    // Read (and ignore) 7 entries
    // Read name of city/town/CDP/village
    // Print name if town
  }
 
  return 0;
}

Finally, we arrive at this, which is a complete working program that we can compile and test on our simplified input. It seems to work. We can test it on other rows like:

24   04000      736,014    209.3     80.8   3516.6   9108.0 Baltimore city

and it works like a champ. However, it fails for something like this:

24   05550        8,860      6.7      2.6   1329.1   3442.5 Bel Air town

Why does it fail? It fails because our program tacitly assumed that a town's name would be a single word, and that's not the case here. So we need to keep reading in words, stopping when we read in town or city or CDP or village. So we might modify that piece of our code to look like this:

// Read name of city/town/CDP/village
string t;
fin >> t;
while(t != "town" && t != "city" && t != "CDP" && t != "village")
{
  fin >> t;
}
 
// Print name if town
if (t == "town")
  cout << ??? << endl;

This is fine, except that when it comes time to print out the town name, we don't know what to do. We threw away the town name in the process of looking for city/town/CDP. What we can do is store the town name in a new string, using the fact that for strings, the "+" operator sticks strings together, i.e. concatenates strings. So now we have this, which is a complete working program that we can compile and test on our simplified input ... this time with town names of multiple words.

All right. Next let's remove our simplifying assumption that says tempdata only contains one row. In other words, lets deal with several rows of data. The key here is going to be determining that there is no more data to read. There's a trick to this: If you read in from an ifstream object and you come to the end of the file, the ifstream object goes into an error state. And remember, if an istream is cast to a bool, it goes to false if it's in an error state, and true otherwise. So all we have to do is place the fin >> s that reads in the first string of a line in the while loop test-condition, and we'll fall out of the loop when the file ends and there is no "next line" to read. Here's a solution along these lines.

Next let's lift the assumption that all this header information has been stripped out of the file. How can we read through that garbage to get to the stuff we care about? Well, notice that the last word in the header is "NAME", and that it only occurs in that one place. We can keep reading in strings until we read the string "NAME"! This is pretty simple, it looks something like this:

  // Read in (and ignore!) header information
  string kill;
  do {
    fin >> kill;
  } while(kill != "NAME");
Lastly,
we need to modify our program so that the file name is input by the user, as
opposed to being hard-coded into the program. This is done, for example, as
follows: 
 
  // User enters file name
  string file;
  cout << "Enter name of input file: ";
  cin >> file;
 
  // Open census data file
  ifstream fin(file.c_str());
  if (!fin)
  {
    cout << "Error opening file!" << endl;
    return 0;
  }
    

And finally, here is a complete program!

 

Problems

1.      Census Statistics - The census keeps tables of populations and population densities for all of our states. Each state has its own file giving the names of all cities, towns, and CDP's ("census designated place" - this appears to be census-eese for "other") in that state. For example, take a look at Maryland's geographic census data. Getting information from these files presents us with problems that are very representative of what a lot of programming is about.  

o        Write a program that reads this data for a given state and prints out the number of cites, then the number of towns, then the number of CDP's.

o        Write a program that reads this data for a given state and lists each town/city/CDP along with it's area.

o        Write a program that reads this data for a given state and prints out the average population density for a city, then the average population density for a town, then the average population density for a CDP.

2.      Suppose your given a list of student names and alpha codes followed by a list of grades. Read this information in, and print out the name of the student with the highest average for the given grades. The format of your input is demonstrated in this example:

3.                 John Jeffery Jones m040101 85, 95, 22, 54;
4.                 Sandra Smith m040201 99, 98;
5.                 Xavier Xenon Junior m050400 78, 80, 82, 84, 86, 88, 90;

For this input, your program would print out

Sandra Smith had the highest average.

A good way to approach this problem would be to write three successive programs:

A program that reads in students whose name is just one word, and whose list of grades always consists of a single score. For example, input for this might look like:

                        Jones m040101 85
                        Smith m040201 99
                        Xenon m050400 78
 
 A
program that reads in students whose names may consist of multiple words, but
whose list of grades always consists of a single score. For example, input for
this might look like: 
                        John Jeffery Jones m040101 85
                        Sandra Smith m040201 99
                        Xavier Xenon m050400 78
         

Hint: Remember that if S is an object of type string and c is an object of type char, then c + S is the string object you get by tacking the character c onto the beginning of S.

9.    A program to solve the full problem using this input file.


Assoc Prof Christopher Brown

Last modified by LT M. Johnson 08/15/2007 05:50 PM