Problem 76

This is the archived version of this course from the Spring 2013 semester. You might find something more recent by visitning my teaching page.

# Max bucket size in a hash table

Due: April 12
Points: 5

In class, we were deriving an asymptotic bound for the expected size of the largest bucket in a simple (single hash function) hash table. Here's what we said:

1. We want to determine the probability that all of the buckets have fewer than $$k$$ elements in them. Call this probability $$p$$:
2. $$p = \mathrm{Pr}[\text{every bucket has fewer than k elements}]$$
3. Given some subset of $$k$$ elements, the probability that all of those $$k$$ have the same hash value is $q = \left(\frac{1}{s}\right)^{k-1} = \frac{1}{s^{k-1}}$
4. Therefore the probability that $$k$$ elements do not all have the same hash value is $1-q = 1-\frac{1}{s^{k-1}}$
5. There are $$\binom{n}{k}$$ subsets of $$k$$ inserted elements.
6. Every bucket has fewer than $$k$$ elements if every subset of $$k$$ elements don't all have the same hash value. From steps (4) and (5), this is $p = (1-q)^{\binom{n}{k}} = \left(1-\frac{1}{s^{k-1}}\right)^{\binom{n}{k}}$
7. We can approximate $$\binom{n}{k}$$ as follows: $\binom{n}{k} = \frac{n(n-1)(n-2)\cdots(n-k+1)}{k(k-1)(k-2)\cdots 1} \approx \left(\frac{n}{k}\right)^k$
8. For any big numbers $$a$$ and $$b$$, we can approximate $$(1-1/a)^b$$ as $\left(1 - \frac{1}{a}\right)^b = \left(\left(1 - \frac{1}{a}\right)^a\right)^{b/a} \approx \left(\frac{1}{\mathbb{e}}\right)^{b/a} = \mathbb{e}^{-b/a}$ where $$\mathbb{e}$$ is the constant you all know and love, $$\mathbb{e}\approx 2.71828$$

Now use these steps to finish off the proof that we should expect buckets to be $$O((\log n)/(\log\log n))$$ in size, when $$s=2n$$.

To accomplish this, first plug in $$2n$$ for $$s$$ in the formula from (6), and set $$p=1/2$$. Next, use (7) and (8) to simplify the exponential in terms of $$\mathbb{e}$$ and without any combinations or factorials.

After you take the log of both sides, you should get a formula for $$\lg n$$ in terms of $$k$$, showing that $$\lg n \in \Theta(k\log k)$$. Now you just need to reverse this to show the result we want, that $$k \in \Theta((\log n)/(\log\log n))$$.