- Due before 23:59 Wednesday, November 16
- Submit using the submit program
301 lab 11.
It has been well covered that the NSA and other intelligence agencies collect a huge amount of communications metadata, which contains the records of who has contacted whom, but are legally constrained in how they access and use this data. One of the constraints is that they may only access the metadata of individuals who are three "hops" from an individual for whom the NSA has "reasonable articulable suspicion."
If person A is suspicious, and he or she has contacted person B, then B is one hop from A. If B then contacts C, then C is two hops from A.
The metadata, of course, creates a directed graph detailing who has contacted whom, and our graph search algorithms make a basis for a magnificent way of answering lots of interesting questions about who may and may not be investigated.
Download the file lab11.tar.gz and extract it using
tar xzvf lab11.tar.gz. You will see two files, one with real data
and one with synthetic data.
The real data is from email conversations in the EU. It has been converted to a dot file for you (email.dot)
The other file (small.dot) is made-up synthetic data for easier testing and development. Here is a visualization of that graph:
To understand this graph, consider individuals 0 and 3. Inside the dot
file, you will see a line
0 -> 3;. This means that person 0 has
contacted person 3. Person 3 is therefore one hop from person 0. Because the
arrow does not go from person 3 to person 0, person 0 is NOT one hop away from
Create a file
graph.py with a class called
(as well as any other classes you might want).
Graph class should have:
- a constructor that takes the filename of a .dot file as an argument and creates the corresponding graph. Note that unlike last week, these dot files store directed graphs!
- a method
hopper(self, name, hops)for which
nameis an integer vertex id (for the graph above, these could be the integers 0-10, NOT the strings '0'-'10'), and
hopsis an integer indicating the number of allowed hops. This method should return a list of the names of all vertices which are
nameor closer. For example, calling
g.hopper(0,1)on the above graph should return the list
[0,1,2,3], in any order (notice these are integers, not strings). Similarly, calling
[3,4,5,6], again in any order.
- A method
canSurveil(self, suspect, target, hops), which returns
suspect, and False otherwise. All three inputs should be integers.
It will be convenient to use
small.dot when developing your methods initially,
so that you can easily follow along what your program should be doing.
But be sure to test your program on the real dataset as well — one common
mistake will create an exception if you run
g.hopper(3,2) on the real
dataset). You may find Python's implementation of a Queue, deque,
to be useful.
The rest of this lab is optional for your enrichment and enjoyment. Be sure to work on this only after getting everything else perfect. You should still submit this work, but make sure it doesn't break any of the parts you already had working correctly!
Add the following methods to
distance(self, suspect, target), which returns the smallest number of hops to reach the given target from the given suspect.
percentage(self, suspect, hops), which returns a float indicating the percentage of individuals in the data set which can be reached by taking
likelyPercentage(self, hops), which returns a float indicating the expected percentage of individuals which can be reached in
hopshops from a random suspect. In other words, this is the average of the above method
percentage, given an input of all valid ids as suspect.