Overview
This lab is a subset of a project developed by
Dr. Chambers.
You will write a program that allows the user to interactively
search through a dataset of over 180,000 Twitter posts. Here is a
sample run:
~/$ java Lab06 alltweets.txt
188671 tweets
> filter Potter
37 tweets
> filter Ron
1 tweets
> dump
This is a ficticious post about Harry Potter and Ron Weasley. dbrown88 2013-08-12
1 tweets
> reset
188671 tweets
> filter Timberlake
10 tweets
> filter! Justin
2 tweets
> dump
@adamaviv J. Timberlake is da bomb! lmcdowell 2015-01-27
Staying at Timberlake Lodge while I ski. adamaviv 2015-02-12
2 tweets
> quit
As you can see, the basic commands are:
dump, quit, reset, filter and filter! — the distinction
between "filter" and "filter!" being that "filter" keeps
everything that has the keyword, while "filter!" keeps
everything that doesn't have the keyword.
Note: The filter and filter! commands only look for a
match in the tweet message text, i.e. not in the user name or
date!
My real goal for you, aside from having some fun, is to make use
of inheritance to solve problems, to understand how it allows you
to add or modify implementations without access to the underlying
implementation, and to face situations where you have to make good
decisions about where to put functionality.
Note: this lab is to be done in pairs. In this case,
that'll mean two people and one keyboard. You will take turns
typing, and all files will have both of your names in them.
I want you to think hard about design decisions: what code goes
in which class? What functionality (interface) should each class
provide? How can I maximize code reuse? Am I following the
principles of encapsulation and information hiding? Am I making
good use of inheritance? Talk about these things between the
two of you as you work. Discuss first, code second.
Part 0: make the lab06 directory in the alpha partner's account
Choose one partner as the one in whose account you will work.
Move over to their lab machine, make a
directory lab06 in which to work, and cd
to that directory.
The Data
We are providing to you almost
200 thousand actual public
tweets. Your program will read them all, and provide a search interface.
As part of his research, USNA Computer Science's very own
Dr. Chambers had a continual feed into Twitter downloading
millions of tweets every day. This is just a fraction of a
fraction of the data that he had here in the CS department. He and
his students
used it to do fundamental research in artificial intelligence and
information extraction, such as
finding correlations with presidential approval polls (some of your elder students
have conducted this kind of research).
This lab will let you have some fun poking around the data.
Disclaimer
one: You may not distribute this project's data to anyone
beyond the USNA. Our agreement with Twitter prevents copying and
distributing.
two:
This is raw, real world data. The standard disclaimer
applies as it does whenever we step out onto the Web. You may
come across offensive material. Please behave like mature adults
and future officers, as appropriate.
We have two real files for you, "alltweets.txt" (the whole data set),
and "sometweets.txt" (a small subset for testing).
We also have a very small file "faketweets.txt" for example
output.
Create a directory
Download them
to your current directory (which should be
lab06)
with the following commands:
curl https://faculty.cs.usna.edu/~wcbrown/internal/tweets.tgz | tar xz
Code that you MUST base your program off of
You
must use as a foundation for your program two classes
that we have written for you:
Tweet and TweetQueue.
The twist is, that I am
not giving you access to the
source code (.java files) for these classes, only the compiled
bytecode (.class files) ... and some really nice documentation.
Note, as you look through the documentation, that TweetQueue
includes the nice iterator interface we went over in class.
Part 1: Reading and writing
In this section you'll be implementing the commands
"dump" and "quit".
You will doubtless want to create at least one new class by
extending TweetQueue to do this.
Here are some requirements:
-
Your program (which must start from a class named Lab06)
must print a prompt (including # of tweets)
exactly as shown in the example output.
-
The user can give the "dump" commands over and over again
and should see all the data each time.
-
The input file (given as a command-line argument) can only be
read once by the program, and if no argument is given, the
program must print a nice usage message and exit. Similarly,
if the file can't be opened for some reason, an error message
should be printed.
-
You must use the Tweet and TweetQueue classes. You should
not use any linked-list (or array) code other than TweetQueue.
You can build off of it, of course.
Note: no lab submission violating these rules will be accepted.
~/$ java Lab06
usage: java Lab06 <filename>
~/$ java Lab06 faketweets.txt
5 tweets
> dump
which bowtie should I wear? DrCrabbe 2016-01-05
can u belive crabbe's bowtie? DrRoche 2016-01-06
bowties are making a comeback, like plaid DrTaylor 2016-01-06
notie is better than bowtie DrBrown 2016-01-07
plaid is back? I'm cool again! DrAviv 2016-01-08
5 tweets
> quit
Part 2: Filter and Filter!
For this part you will add the commands
"filter", "filter!" and "reset".
Note: The filter and filter! commands only look for a
match in the tweet message text, i.e. not in the user name or
date!
Here are some requirements and helpful suggestions:
-
You may modify Lab06 as much as you like, but do not modify
(unless you find some bugs) your Step 1 extension or
TweetQueue. Instead, if you need added functionality I want
you to extend your Step 1 extension of TweetQueue.
-
To find a keyword in a Tweet, you might find the String method
indexOf(String str) useful.
-
My solution had a line in Lab06 that looked something like
this:
currQueue = currQueue.filter(...);
You don't have to do it this way, but it might help.
~/$ java Lab06
usage: java Lab06 <filename>
~/$ java Lab06 faketweets.txt
5 tweets
> filter plaid
2 tweets
> dump
bowties are making a comeback, like plaid DrTaylor 2016-01-06
plaid is back? I'm cool again! DrAviv 2016-01-08
2 tweets
> filter! bowtie
1 tweets
> dump
plaid is back? I'm cool again! DrAviv 2016-01-08
1 tweets
> reset
5 tweets
> filter! plaid
3 tweets
> dump
which bowtie should I wear? DrCrabbe 2016-01-05
can u belive crabbe's bowtie? DrRoche 2016-01-06
notie is better than bowtie DrBrown 2016-01-07
3 tweets
> quit
Submit
Make sure your .java files include both partners' name in a
comment at the top of the file. You only need one submission for
the pair ... as long as both your names are in ever file! Submit as:
submit -c=IC211 -p=lab06 *.java
Challenge Step: Undo! (going further)
As an optional challenge if you finish early, see if you can add
an "undo" command, so that you can go back to the Queue that was
current before the previous "filter" or "filter!" command.
Ideally, you'd want unlimited undo, meaning that you could undo
(in reverse) all the filter and filter! commands up to either the
program start or the last "reset" operation.
Unlimited undo can be a bit of a challenge. You want to keep a
"stack" of all the old Queues. A "Stack" is like a Queue, except
that instead of enqueue and dequeue you have "push" and "pop".
Pop is just like "dequeue", you remove and return the frontmost
item in the list. But "push" adds a new item to
the
front rather than to the back like "enqueue" does.
So pop returns the most recently added item (the newest) rather
than the oldest item like we get with a Queue. As an example, you
may look at my implementation of Stacks for Strings, which should
be easy to adapt.