The problem
We have a set of symbol sequences, which should be mapped to a pre-defined number of bucket-indexes.
Prerequisites
The symbol sequences are restricted in length (64 characters/bytes), and the hash algorithm used is the Delphi implementation of the Bob Jenkins hash for a 32bit hashvalue.
To further distribute the these hashvalues over a certain number of buckets we use the formula:
bucket_number := (hashvalue mod (num_buckets - 2)) + 2);
(We don't want {0,1} to be in the result set)
The question
A colleague had some doubts, that we need to choose a prime number for num_buckets
to achieve an optimal1 distribution in mapping the symbol sequences to the bucket_number
s.
The majority of the team believe that's more an unproven assumption, though our team mate just claimed that's mathematically intrinsic (without more in depth explanation).
I can imagine, that certain symbol sequence patterns we use (that's just a very limited subset of what's actually allowed) may prefer certain hashvalue
s, but generally I don't believe that's really significant for a large number of symbol sequences.
The hash algo should already distribute the hashvalue
s optimally, and I doubt that a prime number mod divisor would really make a significant difference (couldn't measure that empirically either), especially since Bob Jenkins hash calculus doesn't involve any prime numbers as well, as far I can see.
[TL;DR]
Does a prime number mod divisor matter for this case, or not?
1) optimal simply means a stable average value of number-of-sequences per bucket, which doesn't change (much) with the total number of sequences