In a producer-consumer web application, what should be the thought process to create a partition key for a kinesis stream shard. Suppose, I have a kinesis stream with 16 shards, how many partition keys should I create? Is it really dependent on the number of shards?
-
Take a look at this question, maybe it helps; http://stackoverflow.com/a/31377161/1622134 – az3 Jul 13 '15 at 07:19
2 Answers
Partition (or Hash) Key: starts from 1 up to 340282366920938463463374607431768211455. Lets say ~34020 * 10^34, I will omit 10^34 for ease...
If you have 30 shards, uniformly divided, each should cover 1134 * 10^34 hash keys. The coverage should be like this.
Shard-00: 0 - 1134
Shard-01: 1135 - 2268
Shard-03: 2269 - 3402
Shard-04: 3403 - 4536
...
Shard-28: 30619 - 31752
Shard-29: 31753 - 32886
Shard-30: 32887 - 34020
And if you have 3 consumer applications (listening to these 30 shards) each should listen 10 shards (optimum balanced).
This also explains Merge and Split operations on a Stream.
- To merge 2 shards, they should cover adjacent hash keys. You cannot merge Shard-03 and Shard-29.
- You can split any shard. If you split shard-00 in the middle, the distribution will like this;
Shard-31: 0 - 567
Shard-32: 568 - 1134
Shard-01: 1135 - 2268
Shard-03: 2269 - 3402
Shard-04: 3403 - 4536
...
Shard-28: 30619 - 31752
Shard-29: 31753 - 32886
Shard-30: 32887 - 34020
See, Shard-00 will no longer accept new data. The new records that are put in Kinesis stream with the same partition key range (as Shard-00) will be placed under Shard-31 or Shard-32.
While sending data to Kinesis (ie. producer side), you should not worry about "which shard the data goes to". Sending a random number (or uuid, or current timestamp in millis) would be best for scaling and distributing the data effectively on shards. Unless you are worried about the ordering of records in a single shard, it is best to choose a random number/constantly changing partition key for put_record request.
In Java you can use "putRecordsRequestEntry.setPartitionKey(Long.toString(System.currentTimeMillis()))
" or "putRecordRequest.setPartitionKey(Long.toString(System.currentTimeMillis()))
" can be examples.

- 3,571
- 31
- 31
-
4We have experienced a bad situation with **timestamp**. In milisecond differences current timestamp as partition key does not work as expected. Thus, we have changed it with **uuid**. – Osman Alper Jun 29 '17 at 10:16
-
2Please note, uuid creation for every message can be time (and entropy) consuming. – az3 Aug 03 '18 at 08:13
-
1Thanks, worked for me @az3. My kinesis stream has 32 shards and works perfectly. – Bilal Demir Nov 03 '20 at 12:53
It totally depends on the use case. All you need to make sure is that all relevant data goes to a single shard so that you can aggregate data for a key if needed.
If you dont have that requirement using any random key should be fine.

- 151
- 1
- 2