I'm reading about MinHash technique to estimate the similarity between 2 sets: Given set A and B, h is the hash function and hmin(S) is the minimum hash of set S, i.e. hmin(S)=min(h(s)) for s in S. We have the equation:
p(hmin(A)=hmin(B))=|A∩B| / |A∪B|
Which means the probability that minimum hash of A equals to minimum hash of B is the Jaccard similarity of A and B.
I am trying to prove above equation and come up with my own proof: for a∈A and b∈B such that h(a)=hmin(A) and h(b)=hmin(B). So, if hmin(A)=hmin(B) then h(a)=h(b). Assume that hash function h can hash keys to distinct hash value, so h(a)=h(b) if and only if a=b, which has a probability of |A∩B| / |A∪B|. However, my proof is not complete since hash function can return the same value for different keys. So, I'm asking for your help to find a proof which can be applied regardless the hash function.