I am trying to create an alternative to a bloom filter. I am using an array of bits that has capacity to hold 100 billion bits (around 25 GB). Initially, all the bits will be set to zero.The steps I will take to create it are as follows :
- I will take an input and generate a hash using SHA-256(due to less chances of collision) and perform modulus operation with 100 billion on the generated hash to obtain a value say N.
- I will set the bit on the Nth position in the array to 1.
- If the bit is already set on the Nth position, then I will add the input to a bucket specific for that bit.
How do I find the increase in the number of collisions as a result of performing modulus on the hash value ?
If I have 40 billion entries as the input, what are the chances of collisions using the proposed method?