Text Analysis!

In this lab, you're going to do some basic file analysis to a few famous books. Download this tarball to a lab04 directory, and untar it with the command

tar xzf books.tgz

Step 1: Word Count and Average Word Length

Write a program in a file called part1.cpp which prompts the user for a filename of a text file, and prints out the total number of words in the file, as well as the average word length. The length of a string s can be found with the command s.length(). For example, to read in a string from a user and print out its length, you would run:
string s;
cin >> s;
int lengthOfWord = s.length();
cout << lengthOfWord << endl;

An example run of the program is shown here:

~$ ./part1
Enter a filename: shortExample.txt
Word count: 12
Average word length: 4.3333

~$ ./part1
Enter a filename: uncleTomsCabin.txt
Word count: 183643
Average word length: 4.55154
Warning:

If you have word count 13 for shortExample.txt, you are counting the last word twice! Go back to notes on Class 09 (see the section of "when a file ends") and figure out why, and fix your code!

Show your professor your progress, and upload with the command

~/bin/submit -c=SI204 -p=lab04 part1.cpp

Step 2: Sentence Length

We'll assume sentences always end with one of the following characters:

. ! ?
and that only ends of sentences contain those characters (this may not be entirely accurate, but is good enough for our purposes).

For example, the following shows a sentence of length 2.

 Hello World!
In a file called part2.cpp, augment your solution to part 1 to additionally output the average sentence length.

You can find if a string s contains some string, using find() function. For example, if you want to find if string s has string "he", you can use the command s.find("he").

It will return an int. In particular:

For example, consider this code:
string s;
cin >> s;
if( s.find("he") == string::npos )
  cout << "Not found..." << endl;
else
  cout << "Found!" << endl;
If string s is "world", the code will output "Not found...". On the other hand, if s is "hello", the code will output "Found!".

So, using this find function, one can check if a string has a "." as follows:

string s;
cin >> s;
if( s.find(".") != string::npos )   // we use != this time 
  cout << "Found!" << endl;
else
  cout << "Not found..." << endl;

Here is an example run:

~$ ./part2
Enter a filename: uncleTomsCabin.txt
Word count: 183643
Average word length: 4.55154
Average sentence length: 16.5191

Show your professor, and upload with the command

~/bin/submit -c=SI204 -p=lab04 part1.cpp part2.cpp

Step 3: Going Further, Multiple Files

Finally, we'll augment your part 2 solution in a file called part3.cpp so that this will run multiple times, until the filename entered is "quit". For example:
~$ ./part3
Enter a filename: uncleTomsCabin.txt
Word count: 183643
Average word length: 4.55154
Avg sentence length: 16.5191

Enter a filename: shortExample.txt
Word count: 12
Average word length: 4.3333
Avg sentence length: 4

Enter a filename: quit
~$

Show your professor, and submit with the command

~/bin/submit -c=SI204 -p=lab04 part1.cpp part2.cpp part3.cpp