What are the best prime numbers to choose for a double hashed hash table size?
side info
- the hash table is part of a word analysis project, Markov models, training bots to model and generate text as if someone else would write it (which takes a lot of words, sentences, transcripts, books... the bigger the corpus, the better)
- I'm not familiar with most of the math around prime numbers but I will read on everything you guys propose and then try to go from there
what I have in mind:
- the prime numbers shouldn't be too far/close to each other ----> I don't have to increase the size frequently, but the hash table doesn't end up half empty (less collisions, looking for ideal ratio between load factor and hash table size)
- optimal for a big corpus - I'm not sure how big the prime numbers I have to choose should be, never did this before...
- I also thought of implementing a function (not a hash function) that'd just double the size of the hash table and then look for the closest prime number ------> but that has a running time of O(n) because a prime is only divisible by itself ____( I have to check whether all the numbers up to the number that's double the size of the current hash table size have the remainder other than zero, then increment the size by one/go to the next odd number and test the whole loop again)________ ------> you can imagine that that would be very slow so the better approach is just to have a fixed set of prime numbers up to a million (just for illustration purposes) or so and then just use these for any size changes
Thanks, any additional questions appreciated