This is the archived website of SI 486H from the Spring 2016 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

k-choice hashing simulation

Due: March 22
Points: 5

Write a Java, C++, or Python program that demonstrates how big the buckets might get in a hash table that has multiple perfect hash functions.

Your program should read three integers, $$s$$, $$n$$ and $$k$$, from command line arguments. $$s$$ is the size of the hash table, $$n$$ is the number of things that are going to be inserted, and $$k$$ is the number of hash functions to use.

But you're not actually going to insert anything! Your "hash functions" will be random number generators that produce values from 0 up to $$s-1$$ with equal probability. (This is the "random oracle" hash function we've mentioned in class.) You should know how to write that by now; you can use any built-in random number generators in the language of your choosing.

Here's how it works: you start with an empty table of size $$s$$. Then you "insert" $$n$$ items into this table. For each "item", you compute $$k$$ hash functions, which just means picking $$k$$ random integers between 0 and $$s-1$$. Then you choose the location out of those $$k$$ that has the smallest count and increase that count by one. That is, you "insert" the new item into the least-filled of the $$k$$ computed positions of the hash table.

(You might want to look at this page for some background on this hashing idea.)

At the end, print out a single line for each entry in your table, along with a number of X's equal to the size. On the last line, print out what is the maximum bucket size.

So for example, if $$s=7$$ and $$n=10$$, you might end up with something like:

0:
1: XXXX
2: X
3:
4: X
5: X
6: XXX

Maximum size: 4

Finally, do an empirical study of the effect $$k$$ has on the maximum bucket size when $$s=2n$$. What you should do is choose a large hash table size $$s$$, like 1,000,000 or so. Then set the number of elements to $$n=s/2$$. Keep $$s$$ and $$n$$ the same over a series of experiments, starting with $$k=1$$ and incresing to $$k=10$$. Make a graph plotting the maximum bucket size as a function of $$k$$, and connect the dots to show the curve.

Then make an educated wager: What do you think the best value for $$k$$ would be?

Submit your program according to the instructions on the submit page. Turn in a write-up of your empirical study on paper.