In reference to the paper Bundle Min Hashing for Logo Recognition:
Suppose we have bundles {2,5,18,444,678} and {2,5,79,368,841} and the vocabulary size is 1M. If we have just 1 sketch per bundle then do we need just 1 hash function which hashes 1M integers deterministically to values from uniform distribution in [0,1]. The hash function must have fixed seed for each call.For 4-sketches we just need the same hash function with 4 seeds. Is the thought correct?
Or can we randomly pick a number from the set(bundle) as Min Hash word, since they represent random permutation of set?
Any reference for implementation of hash functions needed in the paper?
Can MurmurHash3 do the work?