Flink/Kinesis Analytics: Even Key Groups Across SubTasks

Question

I have a simple Flink/Kinesis Analytics application with two task slots: Source -> Transform, Repartition -> Sink. My application has 32 KPUs with a parallelism of 1, reading from a Kinesis Stream with 60 shards. After a transformation stage I key by a random digit between 1 and 32 in effort to redistribute work evenly to each subtask. However, I'm not seeing an even distribution of work across subtasks in either the source or the sink.

Here are the first ten subtasks in the first task slot (before repartition), out of 32, only 20 are reading data

Here are the first ten subtasks in the second task slot (after repartition):

I've checked that all 60 shards on Kinesis are producing data. So that's not the problem. My two questions are:

How to I both read data from a source evenly into all available subtasks?
How do I force Flink to evenly assign key groups across subtaks?

score 0 · Answer 1 · answered Feb 13 '23 at 17:42

Based on the Kinesis consumer documentation, it seems like setting UniformShardAssigner would help with more evenly distributing Kinesis shards to Flink sub-tasks. It wasn't explicit in your question, but it looks like you've got a Flink parallelism of 32. With 60 shards, it's likely that you'd get some imbalance (a few sub-tasks consuming from no shards, a few sub-tasks consuming from 2 shards) unless you set this.

Flink/Kinesis Analytics: Even Key Groups Across SubTasks

1 Answers1