SI 204 Spring 2017 / Labs


Lab 08: Sorting PCAPs

(Credit to Gavin Taylor for the original version of this lab.)

1 Packet Capture

You may or may not be familiar with the concept of packet capture, in which a network analyst collects data on the packets being transmitted across a network, so that he or she can analyze this dataset for interesting or alarming features.

First download these files, which contain simulated packet capture sessions. (You can uncompress by running tar xzvf pcaps.tgz after downloading.)

Each file starts with an integer giving the number of connections logged in that file.

Each connection observed is given a unique numerical ID, a time in seconds after the capture began that the connection began, the IP of the source, the IP of the destination, the type of protocol used, and the size, in bytes, of the communication. These values, in that order, correspond to the columns in the file.

This is a sorting lab. Get out that selection sort code (or your favorite illustrative video).

2 Part 1

Write a program called large.c which asks the user for the filename of a packet capture file like the ones given to you, and then prints out the sizes of the ten largest connections in the file, in bytes.

roche@ubuntu$ ./large
What file? pcap.txt
49997
49986
49985
49980
49977
49973
49970
49968
49965
49959

You should do this by creating an array of all of the connection sizes (last column of each row), and then sorting it, largest to smallest, before printing out the first ten elements of the now-sorted array.

Then write a program called small.c which prints out the sizes of the ten smallest connections in the file.

You must use selection sort as we have implemented it in class, with a before function. large.c and small.c may differ by only one character!

3 Part 2

Write a program called distinct.c that prompts the user for a file, reads in this file, and outputs the number of distinct protocols observed in the PCAP session. To do this, make an array of the protocols (strings), sort them, and then iterate through, counting the number of different ones seen (see how the fact that they’re sorted is helpful?).

roche@ubuntu$ ./distinct
Which file? smallPcap.txt
10

4 Part 3 (going further)

Now, we want to know which source IPs are transferring the most total data. Write a program called hogs.c that prompts for a pcap file and reports the 10 IP addresses that used the most total data in all of their connections.

The best way to do this is in multiple steps:

  1. Store the source IP addresses and connection sizes from each row. You could either store these as two separate arrays (easier to setup, but makes sorting them together more challenging), or store them as as a single array of structs if you’ve looked ahead to Unit 8.

  2. Sort by IP addresses first, so you get all the same IP addresses together in the list. Now make a new array (or pair of arrays) where you add up all the connection sizes for a single IP address.

  3. Once you have the array(s) with the total sizes for each IP, now sort that according to the connection sizes, largest first.

Your program should print out the 10 IP addresses with the largest combined connection size.