0

So for example if I have events in order with key A and events in order with key B and a parallelism of 2. Do all the events with key A go to one task slot and key B ones go to the other task slot?

What happens if i only get events in order with key A. Do they also get distributed to the two task slots. Does that mean i lose the order in which they come?

Akula
  • 59
  • 7

1 Answers1

1

No, that's not exactly how it works.

What happens is that each key is mapped onto a key group, where the total number key groups is determined by the cluster's maximum parallelism (a configuration setting). And then key groups are mapped onto task slots. If there are two keys and two slots, it's entirely possible that both keys will be assigned to the same slot.

The key group for key is:

MathUtils.murmurHash(key.hashCode()) % maxParallelism

And the slot for a key group is:

keyGroup * actualParallelism / maxParallelism

As for maintaining ordering, see https://stackoverflow.com/a/69094404/2000823 and https://stackoverflow.com/a/69757412/2000823.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • Hey David! I hope it's fine to iterate on this question furthermore. Having two different streams containing the same type of event each being keyed seperatley. This implies that, even if both streams get keyed on some similar key A, A from stream 1 will get mapped to a different key group than A from stream 2 since both of them are keyed seperatley, right? In my case using, this is problematic because if i try to join both streams using a joinFunction, they don't get join since they might exist on seperate task slots. Is there another workaround? – Akula Nov 10 '21 at 12:53
  • "A from stream 1 will get mapped to a different key group than A from stream 2 ..." No, that's not correct. All participants in the cluster will map a given key to the same slot. Flink's joins depend on this. – David Anderson Nov 10 '21 at 13:37
  • That's strange, in my actual job this is what is currently happening. Joining two streams that are getting keyed seperatley. Inside the test, I basically send the same event with the same key in each stream respectively, however they never get joined unless the parllelism is = 1. – Akula Nov 10 '21 at 14:26
  • After printing it seems that both of the events are on the same instance. Could it be a watermarking issue that they aren't getting joined? – Akula Nov 10 '21 at 15:26
  • 1
    Please create a new question, and provide enough details to reproduce the issue. Also, fyi, there's an example of this in https://github.com/apache/flink-training/tree/master/rides-and-fares that you may find helpful. – David Anderson Nov 10 '21 at 15:51