How does Kinesis distribute shards among workers?

Question

Is there any attempt to keep adjacent shards together when spreading them out over multiple workers? In the documentation example it started with 1 worker/instance and 4 shards. Then auto-scaling occurred and a 2nd worker/instance was started up. The KCL auto-magically moved 2 shards over to worker 2. Is there any attempt at keeping adjacent shards together with a worker when autoscaling? What about when splitting shards?

Thanks

What do you mean by adjacent shards? The events are distributed using hash function, that usually take adjacent keys and spread them around, mostly to other shards. — Guy, Jan 24 '16 at 01:53
Adjacent shards are shards that serve hash keys that "touch". Amazon describes it well here - http://docs.aws.amazon.com/kinesis/latest/APIReference/API_MergeShards.html. It's an important concept when merging shards. — darrickc, Jan 25 '16 at 16:36
Short answer - no. Long answer - You can always override KCL's LeaseTaker algorithm to provide this if it's important. Random stealing is much simpler to implement and solves the primary use case though. — Krease, Nov 05 '17 at 18:13

score 2 · Accepted Answer · answered Jan 25 '16 at 13:47

Random.

If you mean "Kinesis Consumer Application" as "Worker", then the consumer application with the most shards loses 1 shard to another application who has less shards.

"Lease" is the correct term here, it describes a consumer application & shard association. And there is not adjacency check for taking leases, it is pure random.

See source code, chooseLeaseToSteal method: https://github.com/awslabs/amazon-kinesis-client/blob/c6e393c13ec348f77b8b08082ba56823776ee48a/src/main/java/com/amazonaws/services/kinesis/leases/impl/LeaseTaker.java#L414

neurite · Answer 2 · 2016-01-26T18:08:14.910

0

Is there any attempt to keep adjacent shards together when spreading them out over multiple workers?

I doubt that's the case. My understanding is that order is maintained only within the boundary of a single key and the boundary of a single key falls within a single shard.

Imagine I have 2 keys, key-a and key-b, and the following events occurred:

["event-1-key-a", "event-2-key-b", "event-3-key-a"]

Now we have 2 events for key-a: ["event-1-key-a", "event-3-key-a"]

and 1 event for key-b: ["event-2-key-b"]

Note that sharding happens exactly like the above -- the 2 events for key-a will always end up in the same shard. With that being the guarantee, maintaining the order among shards is not necessary.

edited Jan 26 '16 at 18:08

answered Jan 21 '16 at 19:58

neurite

2,798
20
32

While you are correct that's not what I was asking. I'm curious about resharding and autoscaling not about the order of messages. – darrickc Jan 21 '16 at 20:03
Well I think shards are independent of one another. Does that answer the question? Or is your concern that shards not distributed evenly? – neurite Jan 21 '16 at 21:13
I think that's probably true that they are independent; I'm just curious about how amazon distributes them among workers. My guess is they don't try to keep adjacent shards together. – darrickc Jan 22 '16 at 21:39
@darrickc - it's totally random distribution. No logic at all keeping adjacent shards together. You could always make your own version of KCL that does this if it's important to your use case (it's just a library written on top of the AmazonKinesis interface) – Krease Jul 08 '16 at 06:18

How does Kinesis distribute shards among workers?

2 Answers2