SI204 Lab 11
The Federalist Papers and
Frequency Analysis
Pre-lab homework. None

When Justice Scalia spoke at the Forrestal lecture a few years back, he
encouraged all of us in the audience to read The Federalist Papers.
For your reading pleasure, we downloaded the text of Federalist Paper No. 1,
written by Alexander Hamilton, and stored it in the file federal1.txt.
We also downloaded the text of Federalist Paper No. 2, written by John Jay, and
stored it in the file federal2.txt. (Each paper consists
of fewer than 2000 words!) For your lab, you will perform some analysis on the
words used by these two founding fathers.
- Write a program that takes an
input file name from the user (i.e. federal1.txt or federal2.txt) and an
output file name (output1.txt or output2.txt might be nice) and prints to
the output file all the words in the input file sorted alphabetically. Hint
for this and subsequent parts, it might be helpful to create a small
sample input file (we used the Preamble to
the Constitution) for debugging.
Note that you can use an appropriate version of selectionSort()
similar to what we discussed earlier in class, but you’ll need to define
your own before() function. Here’s a sample of what the program should
write to the screen for part 1 using the Preamble
to the Constitution test file (user input in red):
Enter input file name: lab10_preamble.txt
Enter output file name (do not include the .txt suffix): preambleOUT
Here’s part 1 sample output
for the Preamble to the Constitution test
file. Note that we appended “_part1.txt”
to the user’s output file name entry.
- Modify your program so that
each distinct word only appears once - i.e. don't print out duplicates.
Think about how having sorted the words will help you in this task!
Here’s part 2 sample output
for the Preamble to the Constitution test
file. Note that we appended “_part2.txt”
to the user’s output file name entry.
- Modify your program further
so that each distinct word gets printed out along with the number of times
that word has occurred.
Here’s part 3 sample output
for the Preamble to the Constitution test
file. Note that we appended “_part3.txt”
to the user’s output file name entry.
- Modify your program further
so that it prints to the screen the number of words in the file and the
number of distinct words in the file.
Here’s a sample of what the program should write to the screen for
part 4 using the Preamble to the Constitution
test file (user input in red):
Enter input file name: lab10_preamble.txt
Enter output file name (do not include the .txt suffix): preambleOUT
Total number of words in the file is: 60
Number of distinct words in the file is: 40
- Modify your program so that
capitalization is ignored when determining whether words are equal, and so
that punctuation marks do not appear in your list of words. Hint: Take a look at where the
punctuation marks fall in the ascii table. For our purposes, you can
safely assume that any string that starts with a non-uppercase/lowercase
letter is a punctuation mark. Consider writing a function that takes a
string and returns true if the string starts with any character other than
an uppercase or lowercase letter, and false otherwise. Notice that we
manually modified the original text slightly so that each punctuation mark
is separated from any words by white space. Otherwise this would have
required a more difficult parsing of the punctuation marks from the interior
(and both ends) of each string.

Here’s a sample of what the program should write to the
screen for part 5 using the Preamble to the
Constitution test file (user input in red):
Enter input file name: lab10_preamble.txt
Enter output file name (do not include the .txt suffix): preambleOUT
Total number of words in the file is: 52
Number of distinct words in the file is: 38
Going Further
- Write a new
program (you might want to borrow heavily from the previous program!) that
reads in both federal1.txt and federal2.txt and produces three output
files: one with all the words that appear in both papers, one with all the
words that appear in Hamilton's paper but not in Jay's, and one with all
the words that appear in Jay's paper but not Hamilton's. Inside of each
file, words should be in alphabetical order.
- Modify your
last program so that frequencies of words are also printed in the output.
This, finally, is probably good data for people who research writing in
this way. Using the data generated by your program, compare the federalist
papers with a few essays of similar length (2000 or so words) by more
modern politicians. You’ll need to either write a function to remove
punctuation marks OR remove them manually. Analysis: Based on the data your program generates, give a one
paragraph analysis as to whether there has been any detectable difference
in the frequency profile of words used in political speeches from the late
1700s versus speeches made in more modern times. Be sure to directly cite the data you
collected.
Christopher W Brown