Let's say I have to find estimate the jaccard similarity between documents A and B, and I use k random permutations of the union of these sets/documents to determine the documents' signatures.
How should I set my k value? Since setting it to a really high value would increase computation time significantly, what could be the least value of k which can give me a good jaccard index estimate?
Given error tolerance e>0 and delta, how can I determine the minimum value of k such that the jaccard index is between (1-e)jaccard_estimate and (1+e)jaccard_estimate with probability greater than or equal to (1-delta)?
I believe this can be derived using chernoff inequality bound, but I'm unable to figure about how to go about it. Any help would be appreciated. Thanks in advance!