0

We are processing around 30 million of records per day using a apache flink job, this flink job filters the data data from source kinesis stream and push the filtered data to the respective kinesis streams which are on some other AWS account, here we dump this data to kinesis stream using KPL library and assuming roles from other account, we came accross an issue where role permissions were modified during run time which created a backpressure and started killing all of our other threads which were dumping data to some other kinesis streams, is there any way where we can isolate every producer thread and which should not impact other threads/tasks running.

FYI we are using Kinesis Data Analytics (which internally used Apache Flink)

Yogesh Katkar
  • 133
  • 10
  • I assume you must have some kind of routing rule used by Flink to push incoming data from one kinesis stream to multiple. What you can possibly do is, while pushing data to the first kinesis stream (source for Flink) you can use different partition keys for each kind of data, by doing so you can define the source for each partition key in Flink, and this way you can achieve isolation. This problem can be solved by using something like Kafka where you can create topic-level isolation for each kind of data. – Swapnil Khante Apr 06 '23 at 08:26
  • Every key generated is registered to at least one keygroup, task-slots will read the data from partitions belonging to these keygroups, hence there can be multiple keys getting assigned to a single keygroup which will led single partition receive data for multiple keys, so we ruled out this possibility – Yogesh Katkar Apr 07 '23 at 11:48

0 Answers0