Lab 12: Sorting Lab

  • Due before 23:59 Wednesday, December 7
  • Submit using the submit program as 301 lab 12.

1 Introduction

You all are familiar with the concept of packet capture, in which a network analyst collects data on the packets being trasmitted across a network, so that he or she can analyze this dataset for interesting or alarming features

Our NSA Visiting Professor, Paul Troncone, gives the following list of things he might want from such a packet capture while performing an analysis:

  • IPs that do the most/least talking by numbers of connections
  • IPs that do the most/least talking by total number of bytes
  • Protocols handling the most/least talking by numbers of connections
  • Protocols handling the most/least talking by total number of bytes
  • The most and least common connections (source IP/destination IP pairs)
  • The connections (source IP/destination IP pairs) moving the most data

2 pcap files

The file pcap.csv.gz contains a packet capture session on a real network, organized into CSV format (first uncompress it with gunzip - it's big, so open it using less, head, or tail, rather than OpenOffice or Excel). Each connection observed is given a unique numerical ID, a time in seconds after the capture began that the connection began, the IP of the source, the IP of the destination, the type of protocol used, and the length, in bytes, of the communication.

Here are the first few lines of that file. Your program will need to process any file in this kind of format, not just the one example I'm providing for you.


3 Tuples

We have seen tuples many times in Python, but frequently without really talking about them much. A tuple in Python is like a restricted list, where you can't change the contents once it's created. And while there is a tuple type you can use to create tuples, more commonly we just use a comma-separated list inside parentheses to create one.

It's worthwhile to note for this lab that when you compare two tuples in python, it first tries comparing based on the first tuple element, then if those are equal it breaks ties by looking at the second one, and so on until all elements from both tuples have been compared.

Perhaps a few illustrative examples are in order:

>>> a = ('hello', 2, 3)
>>> type(a)
<class 'tuple'>
>>> b = ('goodbye', 2, 3)
>>> a < b
>>> b < a
>>> c = ('hello', 2, 5)
>>> a < c
>>> x = 2
>>> y = 3
>>> d = ('hello', x, y)
>>> d
('hello', 2, 3)
>>> a == d

4 Your assignment

Within a file called, make a class called PCAP. It should have:

  • A constructor, which accepts the filename of a .csv file in the format of the file linked to above
  • A method called ipsByConnections, which returns a list of tuples, where the first element of the tuple is the connecting (not the connected-to) IP, and the second element is the number of connections for that IP in the file. This list should be sorted, from fewest to most connections.
  • A method called ipsByBytes, which does the same, but sorted by number of bytes. The second number in the tuple should be the total number of bytes.
  • A method called protocolsByConnections
  • A method called protocolsByBytes
  • A method called connectionsByConnections, in which the tuples are, in order, the sending IP, the receiving IP, and the number of connections.
  • A method called connectionsByBytes

All sorting should be done in \(O(n\log n)\) time, using an algorithm you have implemented yourself. If you do things right, you should only have to code up your sorting algorithm once and use it for all these methods! Be sure to document what sorting algorithm you are using in your README.txt file.