I understand the algorithm behind creating signature matrix from shingles by applying hash functions. However I don't understand how to hash a specific band in a signature matrix to buckets. Assume in matrix M, band b1 we have following values for documents C1-C5:
C1 C2 C3 C4 C5
1 0 0 0 2
3 2 1 2 2
0 1 3 1 1
Just by looking at the values, we see C2 and C4 are identical in this band and they should hash into the same bucket. But the other columns will hash into different buckets.
My question is how to hash these columns into buckets and know if they are identical without actually comparing them. How should I define the hash function to map a column to a bucket? like sum of the elements in the column?