2

I'm working with hash tables that use separate chaining as a collision resolution technique.

I do know that the general formula is N/table_length, where N is the number of items currently in the table.

I'm a bit confused by the denominator. Would it be the size of the array + the number of chained elements, or simply the size of the array?

Adam G
  • 145
  • 1
  • 1
  • 9
  • Isn't the proper load factor at least partially a matter of taste? i.e. if you want to waste less memory, you'd specify a higher load-factor, but if you think it's more important to have faster lookup times, you'd specify a lower load-factor. – Jeremy Friesner Nov 11 '18 at 19:23
  • Yes, the load factor one chooses is a matter of taste, but I don't think that applies to how to properly calculate it. – Adam G Nov 11 '18 at 19:45

1 Answers1

8

The purpose of the load factor is to give an idea of how likely (on average) it is that you will need collision resolution if a new element is added to the table. A collision happens when a new element is assigned a bucket that already has an element. The chance that a given bucket already has an element depends on how many elements are in the container.

load factor = # of elements / # of buckets

(In your terminology: the number of items currently in the table divided by the size of the array.)

JaMiT
  • 14,422
  • 4
  • 15
  • 31
  • Why would we not need to take the chained items into account? If we don't then the load factor can exceed 1, right? – Adam G Nov 12 '18 at 03:35
  • 1
    @AdamG Yes, the load factor can exceed 1. Values over 1 indicate that the hash table can no longer operate at ideal performance. (Ideal performance occurs when there have been no collisions. Load factors over 1 indicate that collisions have definitely occurred.) – JaMiT Nov 12 '18 at 04:40