This is the archived website of SI 486H from the Spring 2016 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Problem 66

k-choice hashing simulation

Due: March 22
Points: 5

Write a Java, C++, or Python program that demonstrates how big the buckets might get in a hash table that has multiple perfect hash functions.

Your program should read three integers, \(s\), \(n\) and \(k\), from command line arguments. \(s\) is the size of the hash table, \(n\) is the number of things that are going to be inserted, and \(k\) is the number of hash functions to use.

But you're not actually going to insert anything! Your "hash functions" will be random number generators that produce values from 0 up to \(s-1\) with equal probability. (This is the "random oracle" hash function we've mentioned in class.) You should know how to write that by now; you can use any built-in random number generators in the language of your choosing.

Here's how it works: you start with an empty table of size \(s\). Then you "insert" \(n\) items into this table. For each "item", you compute \(k\) hash functions, which just means picking \(k\) random integers between 0 and \(s-1\). Then you choose the location out of those \(k\) that has the smallest count and increase that count by one. That is, you "insert" the new item into the least-filled of the \(k\) computed positions of the hash table.

(You might want to look at this page for some background on this hashing idea.)

At the end, print out a single line for each entry in your table, along with a number of X's equal to the size. On the last line, print out what is the maximum bucket size.

So for example, if \(s=7\) and \(n=10\), you might end up with something like:

0: 
1: XXXX
2: X
3: 
4: X
5: X
6: XXX

Maximum size: 4

Finally, do an empirical study of the effect \(k\) has on the maximum bucket size when \(s=2n\). What you should do is choose a large hash table size \(s\), like 1,000,000 or so. Then set the number of elements to \(n=s/2\). Keep \(s\) and \(n\) the same over a series of experiments, starting with \(k=1\) and incresing to \(k=10\). Make a graph plotting the maximum bucket size as a function of \(k\), and connect the dots to show the curve.

Then make an educated wager: What do you think the best value for \(k\) would be?

Submit your program according to the instructions on the submit page. Turn in a write-up of your empirical study on paper.