Lab 04: Text Analysis

In this lab, you're going to do some basic file analysis to some famous books. Download this tarball to a lab04 directory, and untar it with the command tar xzf books.tgz.

Step 1: Word Count and Average Word Length

Write a program in a file called part1.cpp which prompts the user for a filename of a text file, and prints out the total number of words in the file, as well as the average word length.

The length of a string s can be found with the command s.length(). For example, to read in a string from a user and print out its length, you would run:

string s;
cin >> s;
int lengthOfWord = s.length();
cout << lengthOfWord << endl;

An example run of the program is shown here:

~$ ./part1
Enter a filename: uncleTomsCabin.txt
Word count: 183643
Average word length: 4.55154

Show your professor your progress, and upload with the command submit lab04 part1.cpp

Step 2: Sentence Length

We'll assume sentences always end with one of the following characters: .!?, and that only ends of sentences contain those characters (this may not be entirely accurate, but is good enough for our purposes). In a file called part2.cpp, augment your solution to part 1 to additionally output the average sentence length.

You can find if a string s contains a substring using the command s.find(substring). It will return an int, which is the number of characters over from the left that substring appears; if the substring doesn't appear at all, it returns a very large value string::npos. For example:

string s;
cin >> s;
if (s.find(".") != string::npos)
  cout << "Found!" << endl;
else
  cout << "Not found..." << endl;

Here is an example run:

~$ ./part2
Enter a filename: uncleTomsCabin.txt
Word count: 183643
Average word length: 4.55154
Average sentence length: 16.5191

Show your professor, and upload with the command submit lab04 part1.cpp part2.cpp

Step 3: Multiple Files

Finally, we'll augment your part 2 solution in a file called part3.cpp so that this will run multiple times, until the filename entered is "quit". For example:

~$ ./part3
Enter a filename: uncleTomsCabin.txt
Word count: 183643
Average word length: 4.55154
Avg sentence length: 16.5191

Enter a filename: shortTest.txt
Word count: 20
Average word length: 4.7
Avg sentence length: 6.66667

Enter a filename: quit
~$

Show your professor, and submit with the command submit lab04 part1.cpp part2.cpp part3.cpp