I'm using murmur2 hash on a byte array but I only want to hash a subset of the bytes, murmur2 only lets me hash the array starting from 0 but I want to specify a non 0 start offset as well as an end offset in the array.
*
* @param data byte array to hash
* @param length length of the array to hash
* @param seed initial seed value
* @return 32 bit hash of the given array
*/
public static int hash32(final byte[] data, int length, int seed) {
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.
final int m = 0x5bd1e995;
final int r = 24;
// Initialize the hash to a random value
int h = seed^length;
int length4 = length/4;
for (int i=0; i<length4; i++) {
final int i4 = i*4;
int k = (data[i4+0]&0xff) +((data[i4+1]&0xff)<<8)
+((data[i4+2]&0xff)<<16) +((data[i4+3]&0xff)<<24);
k *= m;
k ^= k >>> r;
k *= m;
h *= m;
h ^= k;
}
// Handle the last few bytes of the input array
switch (length%4) {
case 3: h ^= (data[(length&~3) +2]&0xff) << 16;
case 2: h ^= (data[(length&~3) +1]&0xff) << 8;
case 1: h ^= (data[length&~3]&0xff);
h *= m;
}
h ^= h >>> 13;
h *= m;
h ^= h >>> 15;
return h;
}
I have tried various changes but it always causes my hash collisions test to go from 0 to a very high number. I don't want to use murmur3 because it does not fit in a single small method like murmur2, murmur2 is also a little faster in my tests.
Here's my collisions tester for anyone who want to have a crack at it
HashSet<Integer> hs = new HashSet<>(100000000,(float) 1.0);
long collide = 0;
long totalLoops = 0;
byte[] ba = new byte[4];
long sTime = System.currentTimeMillis();
int hash;
for(byte d=0; d<5; d++) {
ba[0] = d;
for(byte i=-128; i<127; i++) {
ba[1] = i;
for(byte k=-128; k<127; k++) {
ba[2] = k;
for(byte j=-128; j<127; j++) {
ba[3] = j;
hash = hash32(ba,ba.length,0x9747b28c);
if(hs.contains(hash)) {
collide++;
} else {
hs.add(hash);
}
totalLoops++;
}
}
}
}
Note: the collision test above needs a pc with 8GB of RAM.