Modifying the hashCode() method in java such that vectors can generate same hashcode for vectors that have jaccard similarity above a certain threshold with good accuracy
example:
vector 1: [1,1,0,0,1,0] vector 2: [1,1,0,0,0,0]
they have jaccard similarity of: 0.5
How can i modify the hashCode() method in Java such that vectors that have a similarity of 0.5 and above can go into the same bucket/or same hashcode?
Note: I am not doing it the minhash lsh and candidate pair way. It has to generate the hashcode just with vector itself
The goal is not to do it perfectly(which is impossible), but to do it as accurately as possible.
There will be situation where vector A and B, B and C can go together while A and C couldn't. The hashing function has to map it to either A with B, or B with C, or just A,B and C together