3

I'm trying to understand what algorithm Cassandra uses to generate murmur3 hashes of composite partition keys. I know I can obtain the value directly from CQL but I want to reproduce the behaviour of Cassandra for any given tuple directly from Java/scala code.

For simple partition keys the following function computes the correct value (at least in many cases, I know by looking at source code that it is not exact):

long l = com.google.common.hash.Hashing.Hashing.murmur3_128().hashString("my-string", Charset.forName("UTF-8")).asLong();

What if I have two columns on partition key ?

The hash of the concatenation of the two strings is not the same.

Nicola Ferraro
  • 4,051
  • 5
  • 28
  • 60
  • possible duplicate of [Murmur3 Hash Algorithm Used in Cassandra](http://stackoverflow.com/questions/16562427/murmur3-hash-algorithm-used-in-cassandra) – Aaron Nov 30 '14 at 14:20
  • Follow the link to the question above, and look in the comments on the answer. – Aaron Nov 30 '14 at 14:20

1 Answers1

6

Thanks for giving me more details about the algorithm. I wrote a sample code in order to share the solution.

byte[] keyBytes;
try(ByteArrayOutputStream bos = new ByteArrayOutputStream(); DataOutputStream out = new DataOutputStream(bos)) {    

    String[] keys = new String[] {"key1", "key2"};
    for(String key : keys) {
        byte[] arr = key.getBytes("UTF-8");
        out.writeShort(arr.length);
        out.write(arr, 0, arr.length);
        out.writeByte(0);
    }
    out.flush();
    keyBytes = bos.toByteArray();
}

long hash = Hashing.murmur3_128().hashBytes(keyBytes).asLong();
Nicola Ferraro
  • 4,051
  • 5
  • 28
  • 60