Packet Capture

You may or may not be familiar with the concept of packet capture, in which a network analyst collects data on the packets being trasmitted across a network, so that he or she can analyze this dataset for interesting or alarming features. For example, these files contains a simulated packet capture session(first uncompress it with tar xzf pcaps.tgz). Each connection observed is given a unique numerical ID, a time in seconds after the capture began that the connection began, the IP of the source, the IP of the destination, the type of protocol used, and the size, in bytes, of the communication (these, in that order, correspond to the columns in the file).

This is a sorting lab. Get out that selection sort code (or your favorite illustrative video).

Part 1

Write a program called part1Large.cpp which asks the user for the filename of a packet capture file like the ones given to you, and then prints out the sizes of the ten largest connections in the file, in bytes (you may hardcode the filename):

What file? pcap.txt
49997 49986 49985 49980 49977 49973 49970 49968 49965 49959

You should do this by creating an array of all of the connection sizes (last column of each row), and then sorting it, largest to smallest, before printing out the first ten elements of the now-sorted array.

Then write a program called part1Small.cpp which prints out the sizes of the ten smallest connections in the file.

You must use selection sort as we have implemented it in class, with a before function. part1Large.cpp and part1Small.cpp may differ by only one character

Part 2

Write a program called part2.cpp that prompts the user for a file, reads in this file, and outputs the number of distinct protocols observed in the PCAP session. To do this, make an array of the protocols (strings), sort them, and then iterate through, counting the number of different ones seen (see how the fact that they're sorted is helpful?).

Part 3

Now, we want to know which source IPs are starting connections with the largest data transfer. To do this, in a program called part3.cpp, build two arrays, one of source IPs (strings), and one of connection size (ints), such that ips[i] and sizes[i] both come from row i.

Then, alter selection sort so that every swap that occurs in sorting the sizes is mirrored with a swap between the same indices in the array of IPs (so that ips[i] and sizes[i] still come from the same row, even as the array ips becomes sorted). Your program should then print out the IPs which sourced the ten largest connections.