0

Lets say I'm creating a hash between 7 and 8 million elements using linear probing to handle collisions. How do I figure out how many buckets are required?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
kiyo
  • 41
  • 4

1 Answers1

1

There is no perfect answer... the number of buckets affects both memory usage and performance, and the more collision prone the specific elements are (in combination with your hash function and table size - e.g. a prime number of buckets tends to be more tolerant than a power-of-2) the more buckets you may want.

So, the best way if you need accurate tuning is to get realistic data and try a range of load factors (i.e. # elements to # buckets), seeing where the memory/performance tradeoff suits you best.

If you just want a generally useful load factor as a point of departure, perhaps try .7 to .8 if you've a half-way decent hash function. In other words, an oft-sane ballpark figure for number of buckets would be 8 million / .7 or / .8 which is ~10 to 11.4 million.

If you're serious about tuning this aggressively, and don't have other good reasons for sticking with it (e.g. to support element deletions using immediate compaction rather than "tombstone"s marking once-used buckets over which element lookups/deletions must skip and continue probing), you should move off linear probing as it'll give you a lot more collisions than most-any alternatives.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252