3

I will have C* tables that will be very wide. To prevent them to become too wide I have encountered a strategy that could suit me well. It was presented in this video. Bucket Your Partitions Wisely

The good thing with this strategy is that there is no need for a "look-up-table" (it is fast), the bad part is that one needs to know the max amount of buckets and eventually end up with no more buckets to use (not scalable). I know my max bucket size so I will try this.

By calculating a hash from the tables primary keys this can be used as a bucket part together with the rest of the primary keys.

I have come up with the following method to be sure (I think?) that the hash always will be the same for a specific primary key.

Using Guava Hashing:

public static String bucket(List<String> primKeyParts, int maxBuckets) {

    StringBuilder combinedHashString = new StringBuilder();
    primKeyParts.forEach(part ->{
        combinedHashString.append(
            String.valueOf(
                Hashing.consistentHash(Hashing.sha512()
                    .hashBytes(part.getBytes()), maxBuckets)
            )
        );
    });
    return combinedHashString.toString();
}

The reason I use sha512 is to be able to have strings with max characters of 256 (512 bit) otherwise the result will never be the same (as it seems according to my tests).

I am far from being a hashing guru, hence I'm asking the following questions.

Requirement: Between different JVM executions on different nodes/machines the result should always be the same for a given Cassandra primary key?

  1. Can I rely on the mentioned method to do the job?
  2. Is there a better solution of hashing large strings so they always will produce the same result for a given string?
  3. Do I always need to hash from string or could there be a better way of doing this for a C* primary key and always produce same result?

Please, I don't want to discuss data modeling for a specific table, I just want to have a bucket strategy.

EDIT:

Elaborated further and came up with this so the length of string can be arbitrary. What do you say about this one?

public static int murmur3_128_bucket(int maxBuckets, String... primKeyParts) {

    List<HashCode> hashCodes = new ArrayList();
    for(String part : primKeyParts) {
        hashCodes.add(Hashing.murmur3_128().hashString(part, StandardCharsets.UTF_8));
    };
    return Hashing.consistentHash(Hashing.combineOrdered(hashCodes), maxBuckets);
}
nicgul
  • 237
  • 1
  • 2
  • 10

1 Answers1

2

I currently use a similar solution in production. So for your method I would change to:

public static int bucket(List<String> primKeyParts, int maxBuckets) {
  String keyParts = String.join("", primKeyParts);
  return Hashing.consistentHash(
                     Hashing.murmur3_32().hashString(keyParts, Charsets.UTF_8),
                     maxBuckets);
}

So the differences

  1. Send all the PK parts into the hash function at once.
  2. We actually set the max buckets as a code constant since the consistent hash is only if the max buckets stay the same.
  3. We use MurMur3 hash since we want it to be fast not cryptographically strong.

For your direct questions 1) Yes the method should do the job. 2) I think with the tweaks above you should be set. 3) The assumption is you need the whole PK?

I'm not sure you need to use the whole primary key since the expectation is that your partition part of your primary key is gonna be the same for many things which is why you are bucketing. You could just hash the bits that will provide you with good buckets to use in your partition key. In our case we just hash some of the clustering key parts of the PK to generate the bucket id we use as part of the partition key.

Jeff Beck
  • 3,944
  • 3
  • 28
  • 45
  • I was just about to edit my "Question" with a better solution and it is very similair – nicgul Oct 22 '16 at 12:56
  • oops, hit enter ... see my update and tell me what your think. – nicgul Oct 22 '16 at 12:56
  • BTW, great answer, .. gives me more confidence! – nicgul Oct 22 '16 at 13:02
  • I'm not sure why you want to hash each part individually then do the consistent hash. Just hashing the appended parts works fine for us as of now. I think MurMur3 is the way to go. And you should review my above notes on using all of the PK. – Jeff Beck Oct 22 '16 at 13:03
  • Will do! Thanks Jeff! ... (Great name you have) :) – nicgul Oct 22 '16 at 13:07
  • Thanks let me know if you need anything else before accepting the answer. – Jeff Beck Oct 22 '16 at 13:09
  • One thought: I should go for the murmur3_32 and skip murmur3_128? – nicgul Oct 22 '16 at 13:14
  • I don't know your max bucket size so for us it was fine to use 32. – Jeff Beck Oct 22 '16 at 13:15
  • At the moment I use Integer.MAX_VALUE, ... is that stupid? The reason is that I will have that as the limit for the tables that might need that. But I suppose I will calculate a bucket size depending on situations. – nicgul Oct 22 '16 at 13:19
  • Further explanation on Integer.MAX_VALUE. Will mostly be used when amount of rows to be inserted are unknown. I.e. I will set a max on those tables – nicgul Oct 22 '16 at 13:27
  • If using max int what use are the buckets at all vs just adding more to the partition key? – Jeff Beck Oct 22 '16 at 13:52
  • Maybe I just got it all wrong. But, if I calculate a table to hit the recommendation of 100's of MB/partition at a certain point (wide row count) I need to create a new bucket for the "same" partition. If the "rows" inserted are unknown I set a very high max of buckets, hence the Integer.MAX_VALUE. The reason for using this bucket strategy is to be able to look up certain rows fast for update of redundant data. With a max buckets of 10 I will only be able to store (per primary key) 100's of MB * 10 but with Integer:MAX_VALUE I will be able to ... well you get it. Wrong thinking? – nicgul Oct 22 '16 at 14:15
  • Well, yes, .. I guess you are right Jeff, ... maybe I am just overdoing this, .. I need to rethink. Anyway, I got the answer from the initial question. Thanks for that. I will accept your answer. Have a great dag! – nicgul Oct 22 '16 at 14:39
  • Yes in general they are the same partition because you need to look them up that way. If you have more data available to look things up you can add more columns to the partition key portion of the PK – Jeff Beck Oct 22 '16 at 14:44
  • Sure will look in to it, ... maybe the only defense for my usage of a huge max would be if I want to able to skip clustering parts when loading rows. These parts of the primary can then not be in partition key. So, just to finish this discussion .. :) ... murmur3_32 versus murmur3_128 and my strange way of using Integer.MAX_VALUE? – nicgul Oct 22 '16 at 14:52
  • You can't skips parts of the primary key even if they are clustering keys. – Jeff Beck Oct 22 '16 at 14:58
  • To load all rows of a partiton? I cannot skip clustering keys in where statement – nicgul Oct 22 '16 at 15:00
  • ... or that's not what you mean? – nicgul Oct 22 '16 at 15:00
  • Sorry when you said load I thought you meant insert yeah select is fine but you'll need the bucket Id since it will be part of the partition key – Jeff Beck Oct 22 '16 at 15:02
  • Yes, .. :) ... So, I was thinking of creating the bucket using the whole PRIM KEY let say "((part1,part2),clust1,clust2,clust3)" and to find bucket when updating redundant data I recalculate the bucket using PRIM KEY. When I want to select all partition rows I can then skip clustering keys. In this case I can't use any of the clust keys to make the partition smaller, hence using Integer.MAX_VALUE when table rows are unknown. murmur3_32 versus murmur3_128? – nicgul Oct 22 '16 at 15:12
  • With bucket the PRIM KEY would be ((bucket, part1,part2),clust1,clust2,clust3) ... – nicgul Oct 22 '16 at 15:14
  • I am stupid :) ... using the whole key will never create the bucketing that I need, .. sorry for driving this too long. .. I will reconsider your advice and just using some parts from the clustering keys. Thanks! – nicgul Oct 22 '16 at 18:09