I’ve been using alpakka kafka to streaming data from kafka topics. I’m using:
Consumer
.committableSource(consumerSettings, Subscriptions.topics(topic))
Recently I’ve tried to spam more consumers like 3 on a topic which has 15 partitions. When I plug more consumers with the same group id, it kindly split 5 partitions per consumer, but it seems to do not consume all partitions at the same time, it seems to read one by one, or read a specific partition much faster than others.
|Partition|LogSize |Consumer Offset|Lag |
|0 |8,429,145| 6,087,144|2,342,001|
|1 |8,424,948| 6,223,257|2,201,691|
|2 |8,428,121| 7,764,854| 663,267|
|3 |8,421,528| 6,071,425|2,350,103|
|4 |8,434,659| 7,351,552|1,083,107|
|5 |8,428,323| 5,935,336|2,492,987|
|6 |8,424,974| 6,455,301|1,969,673|
|7 |8,431,820| 7,763,984| 667,836|
|8 |8,425,999| 6,370,962|2,055,037|
|9 |8,416,354| 6,681,093|1,735,261|
|10 |8,416,217| 6,814,949|1,601,268|
|11 |8,428,026| 5,878,703|2,549,323|
|12 |8,424,604| 8,424,589| 15|
|13 |8,431,019| 8,431,019| 0|
|14 |8,423,218| 8,423,218| 0|
Here is a real example of a production application I’m running. So I have some questions:
Is it ok to read some partitions much faster than others?
Please, note that this behavior only happens when I start more than one consumer.
Should I change the way I’m consuming? Should I use source per partition, or is there a different option?
Update
I was suspecting that it could happen when plugging more than one consumer(read more than one application), but it happened today using only one consumer, you can see by taking a look at the consumer group, which is the same.
At the time it happened, I had 20MM of messages still waiting to be processed(lag). The above picture is a picture taken from the Kafka manager we have at the company.