0

The Token function in my driver doesn't support a composite partition key, but it works very well with a single partition key, it takes a binary in 8 bits form as an input and pass it to murmur3 hash function and extract the 64-signed-little-integer (Token) from the result of murmur3 and ignore any extra binary buffer.

So my hope is to generate the binary for a composite partition key and then pass it to murmur3 as usual, an algorithm or bitwise operations will be really helpful or at least a source in any programming language.

I don't mean murmur3 part, only the token side which converts/mixes the composite partition key and outputs raw bytes in binary form.

Greenonline
  • 1,330
  • 8
  • 23
  • 31
aiomix
  • 11
  • 3

2 Answers2

1

Take a look at the drivers since they have generate the token to find the correct coordinator. https://github.com/datastax/java-driver/blob/8be7570a3c7fbba773ae2581bbf26e8196e7d6fb/driver-core/src/main/java/com/datastax/driver/core/Token.java#L112

Its slightly different than the typical murmur3 due to a bug when it was made and inability to change it without breaking existing clusters. So I would recommend copying it from them or better yet, use the existing drivers to find the token.

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38
  • Hey Chris, I am developing a driver in pure elixir and it does support murmur3 (cassandra variant), so i think my issue lay in the token function side (before the hashing part?), right now i can easily hash any single partition key, but not a multi column partition key. Unfortunately am not a java guy, can you please guide me to the token part .. which equal system.token(column_pk_1, column_pk_2, etc), i just want the ability to generate the bytes for a composite partition key and then pass it to murmur3, or that will not work and need some modification at the hash function side? Thanks – aiomix Sep 29 '18 at 03:43
  • know any other languages? theres python, js etc that are already written and well tested (ie https://github.com/datastax/nodejs-driver/blob/707953f52709fc507f07edb18659f2e2d9f4d22e/lib/tokenizer.js) – Chris Lohfink Sep 29 '18 at 03:58
  • Yeah i know python ruby c and rust, will check the client drivers in those languages , Thanks – aiomix Sep 29 '18 at 04:00
  • https://github.com/datastax/ruby-driver/blob/master/ext/cassandra_murmur3/cassandra_murmur3.c – Chris Lohfink Sep 29 '18 at 04:46
1

Finally I found a solution to my question :The Algorithm to compute the token for a composite partition key : Primary_key((text, int)) -> therefore the partition key is a composite_partition_key (text, int).

Example : a row with composite_partition_key ('hello', 1)

Applying the algorithm :

1- lay-out the components of the composite partition key in big-endian (16 bits) presentation :

first_component = 'hello' -> 68 65 6c 6c 6f

sec_component = 1 -> 00 00 00 01

68 65 6c 6c 6f 00 00 00 01

2- add two-bytes length of component before each component

first_component = 'hello', length= 5-> 00 05 68 65 6c 6c 6f

sec_component = 1, therefore length= 4 -> 00 04 00 00 00 01

00 05 68 65 6c 6c 6f 00 04 00 00 00 01

3- add zero-value after each component

first_component = 'hello' -> 00 05 68 65 6c 6c 6f 00

sec_component = 1 -> 00 04 00 00 00 01 00

4- result

00 05 68 65 6c 6c 6f 00 00 04 00 00 00 01 00

now pass the result as whatever binary base your murmur3 function understand (make sure it's cassandra variant).

aiomix
  • 11
  • 3