IC210 Lab 11

The Federalist Papers and Frequency Analysis

 

Pre-lab homework. None


 

When Justice Scalia spoke at the Forrestal lecture a few years back, he encouraged all of us in the audience to read The Federalist Papers. For your reading pleasure, we downloaded the text of Federalist Paper No. 1, written by Alexander Hamilton, and stored it in the file federal1.txt. We also downloaded the text of Federalist Paper No. 2, written by John Jay, and stored it in the file federal2.txt. (Each paper consists of fewer than 2000 words!) For your lab, you will perform some analysis on the words used by these two founding fathers.

  1. Write a program that takes an input file name from the user (i.e. federal1.txt or federal2.txt) and an output file name (output1.txt or output2.txt might be nice) and prints to the output file all the words in the input file sorted alphabetically. Hint for this and subsequent parts, it might be helpful to create a small sample input file (we used the Preamble to the Constitution) for debugging.  Note that you can use an appropriate version of selectionSort() similar to what we discussed earlier in class, but you’ll need to define your own before() function. (NOTE: this can be very simple. You do NOT at this time have to do anything facny to handle capitalization or punctuation intelligently). Here’s a sample of what the program should write to the screen for part 1 using the Preamble to the Constitution test file (user input in red):

        Enter input file name: lab10_preamble.txt

        Enter output file name (do not include the .txt suffix): preambleOUT

Here’s part 1 sample output for the Preamble to the Constitution test file. Note that we appended “_part1.txt” to the user’s output file name entry.

  1. Modify your program so that each distinct word only appears once - i.e. don't print out duplicates. Think about how having sorted the words will help you in this task!

Here’s part 2 sample output for the Preamble to the Constitution test file. Note that we appended “_part2.txt” to the user’s output file name entry.

  1. Modify your program further so that each distinct word gets printed out along with the number of times that word has occurred.

Here’s part 3 sample output for the Preamble to the Constitution test file. Note that we appended “_part3.txt” to the user’s output file name entry.

  1. Modify your program further so that it prints to the screen the number of words in the file and the number of distinct words in the file.  Here’s a sample of what the program should write to the screen for part 4 using the Preamble to the Constitution test file (user input in red):

        Enter input file name: lab10_preamble.txt

        Enter output file name (do not include the .txt suffix): preambleOUT

 

        Total number of words in the file is: 60

  Number of distinct words in the file is: 40

  1. Modify your program so that capitalization is ignored when determining whether words are equal, and so that punctuation marks do not appear in your list of words. Hint: Take a look at where the punctuation marks fall in the ascii table. For our purposes, you can safely assume that any string that starts with a non-uppercase/lowercase letter is a punctuation mark. Consider writing a function that takes a string and returns true if the string starts with any character other than an uppercase or lowercase letter, and false otherwise. Notice that we manually modified the original text slightly so that each punctuation mark is separated from any words by white space. Otherwise this would have required a more difficult parsing of the punctuation marks from the interior (and both ends) of each string.

 

Here’s a sample of what the program should write to the screen for part 5 using the Preamble to the Constitution test file (user input in red): 

        Enter input file name: lab10_preamble.txt

        Enter output file name (do not include the .txt suffix): preambleOUT

 

        Total number of words in the file is: 52

  Number of distinct words in the file is: 38

 


 

Going Further

  1. Write a new program (you might want to borrow heavily from the previous program!) that reads in both federal1.txt and federal2.txt and produces three output files: one with all the words that appear in both papers, one with all the words that appear in Hamilton's paper but not in Jay's, and one with all the words that appear in Jay's paper but not Hamilton's. Inside of each file, words should be in alphabetical order.

 

  1. Modify your last program so that frequencies of words are also printed in the output. This, finally, is probably good data for people who research writing in this way. Using the data generated by your program, compare the federalist papers with a few essays of similar length (2000 or so words) by more modern politicians. You’ll need to either write a function to remove punctuation marks OR remove them manually. Analysis: Based on the data your program generates, give a one paragraph analysis as to whether there has been any detectable difference in the frequency profile of words used in political speeches from the late 1700s versus speeches made in more modern times.  Be sure to directly cite the data you collected.

 


Christopher W Brown

Last modified by LT M. Johnson 11/14/2007 03:54 PM