I have a Java Flink (version 1.15) application with an Async I/O operation running in AWS Kinesis Flink Runtime with parallelism set to 12. The operation reads a stream of messages from a FlinkKinesisConsumer and processes it in an async operation.
FlinkKinesisConsumer:
DataStream<> inputMessages =
env.addSource(new FlinkKinesisConsumer<>(kinesisStream, new Deserializer(),
streamSourceProperties));
Async operation:
DataStream<> processedStream =
AsyncDataStream.unorderedWait(inputMessages, new AsyncIOFunction(environment), 1000,
TimeUnit.MILLISECONDS, 100);
In the Flink Dashboard, I see that the FlinkKinesisConsumer and the Async operation has 12 subtasks. However the data is only processed in on of the subtasks.
The following screenshots shows the records processed in one subtask, while the others have no load.
Kinesis Consumer:
Why is this distribution not equal? Will this have an impact on performance if the load is high? Also, how can I make the processing equally distributed among the subtasks?
I saw this question Equally distribute operators with single parallelism in a multi-parallel Flink application but I am still not sure of the answer for this.