The index that a key is associated with is generally, in the most simplest implementation of a hash table, retrieved in the following way:
size++;
int hash = hashcode(key);
int index = hash % size;
For an arbitrary key we can say that the index will be an integer in the range [0, size - 1]
with equal probability for each outcome. These probabilities are described by the table below for the first 5 indices after adding N elements.
Index | 0 1 2 3 4
--------------------------------------------------------------------------------------------
Probabilities | 1
| 1/2 1/2
| 1/3 1/3 1/3
| 1/4 1/4 1/4 1/4
| 1/5 1/5 1/5 1/5 1/5
| ...
| 1/N 1/N 1/N 1/N 1/N
____________________________________________________________________________________________
Total | H(N) H(N) - 1 H(N) - 1.5 H(N) - 1.83 H(N) - 2.08
H(N)
describes how many elements should collect in the chain at index 0. Every chain afterwards should have statistically fewer elements.
H(N)
is also the value of the harmonic series up to and including the Nth term. Although there is no generalized closed form for describing the harmonic series, this value can be approximated very accurately using the following formula,
H(N) ≈ ln(N) + 0.5772156649 + 1 / (2N) - 1 / (12N^2)
The "approximation" part can be attributed to the terms after ln(N) + 0.5772156649
. ln(N)
is the largest function and thus the amortized time complexity should be O(log n)
.
Is there something I am missing? I would greatly appreciate clarification here.