How do I properly calculate the load factor of a hash table that uses separate chaining?

Question

I'm working with hash tables that use separate chaining as a collision resolution technique.

I do know that the general formula is N/table_length, where N is the number of items currently in the table.

I'm a bit confused by the denominator. Would it be the size of the array + the number of chained elements, or simply the size of the array?

Isn't the proper load factor at least partially a matter of taste? i.e. if you want to waste less memory, you'd specify a higher load-factor, but if you think it's more important to have faster lookup times, you'd specify a lower load-factor. — Jeremy Friesner, Nov 11 '18 at 19:23
Yes, the load factor one chooses is a matter of taste, but I don't think that applies to how to properly calculate it. — Adam G, Nov 11 '18 at 19:45

score 8 · Accepted Answer · answered Nov 12 '18 at 02:30

8

The purpose of the load factor is to give an idea of how likely (on average) it is that you will need collision resolution if a new element is added to the table. A collision happens when a new element is assigned a bucket that already has an element. The chance that a given bucket already has an element depends on how many elements are in the container.

load factor = # of elements / # of buckets

(In your terminology: the number of items currently in the table divided by the size of the array.)

answered Nov 12 '18 at 02:30

JaMiT

14,422
4
15
31

Why would we not need to take the chained items into account? If we don't then the load factor can exceed 1, right? – Adam G Nov 12 '18 at 03:35
1

@AdamG Yes, the load factor can exceed 1. Values over 1 indicate that the hash table can no longer operate at ideal performance. (Ideal performance occurs when there have been no collisions. Load factors over 1 indicate that collisions have definitely occurred.) – JaMiT Nov 12 '18 at 04:40

How do I properly calculate the load factor of a hash table that uses separate chaining?

1 Answers1