I am looking for hash functions that can be used to generate batches out of integer stream. Specifically, I want to map integers xi
from a set or stream (say X
) to another set of integers or strings (say Y
) such that many xi
are mapped to one yj
. While doing that, I want to ensure that there are at max n
xi
mapped to a single yj
. As with the hashing, I need to be able to reliably find the y
given an x
.
I would like to ensure most of the yj
have close to n
number of xi
mapped to them (to avoid very sparse mapping from X
to Y
).
One function I can think of is quotient:
int BATCH_SIZE = 3;
public int map(int x) {
return x / BATCH_SIZE;
}
for a stream of sequential integers, it can work fairly well. e.g. stream 1..9 will be mapped to
1 -> 0
2 -> 0
3 -> 1
4 -> 1
5 -> 1
6 -> 2
7 -> 2
8 -> 2
9 -> 3
and so on. However, for non sequential large integers and small batch size (my use case), this can generate super sparse mapping (each batch will have only 1 element most of the time).
Are there any standard ways to generate such a mapping (batching)