Flink - take advantage of input partitioning to avoid inter task-manager communications

Question

We have a Flink pipeline aggregating data per "client" by combining data with identical keys ("client-id") and within the same window.

The problem is trivially parallelizable and the input Kafka topic has a few partitions (same number as the Flink parallelism) - each holding a subset of the clients. I.e., a single client is always in a specific Kafka partition.

Does Flink take advantage of this automatically or will reshuffle the keys? And if the latter is true - can we somehow avoid the reshuffle and keep the data local to each operator as assigned by the input partitioning?

Note: we are actually using Apache Beam with the Flink backend but I tried to simplify the question as much as possible. Beam is using FixedWindows followed by Combine.perKey

score 2 · Accepted Answer · answered Nov 05 '21 at 10:54

2

I'm not familiar with the internals of the Flink runner for Beam, but assuming it is using a Flink keyBy, then this will involve a network shuffle. This could be avoided, but only rather painfully by reimplementing the job to use low-level Flink primitives, rather than keyed windows and keyed state.

Flink does offer reinterpretAsKeyedStream, which can be used to avoid unnecessary shuffles, but this can only applied in situations where the existing partitioning exactly matches what keyBy would do -- and I see no reason to think that would apply here.

answered Nov 05 '21 at 10:54

David Anderson

39,434
4
33
60

Assuming I control the input Kafka partitioning. Can you give any pointers as to how I should define it to match what keyBy would produce? Thanks for mentioning reinterpretAsKeyedStream! Beam doesn't support that but this does look like what we need here. – nimrodm Nov 05 '21 at 13:13
See https://stackoverflow.com/a/69802568/2000823 for the mapping from keys to slots. The kafka source does a round-robin assignment of partitions to slots. That should be enough to compute which partition to use for each key. – David Anderson Nov 05 '21 at 14:06

Flink - take advantage of input partitioning to avoid inter task-manager communications

1 Answers1