0

My application has a kafka topic with a single partition (kafka 2.4.0). There is a single group id with multiple subscribers (multiple AWS EC2 instances) that read from it. There was a recent effort that imported about three million records into our system, resulting in millions of messages being sent to the topic and read by the consumer group.

When reading from the the topic, the application logs out the offset. For some reason, there was a gap in the offset (about fifty thousand), meaning we lost some messages. The only clue as to why this happened was the following log message:

"Attempt to heartbeat failed since group is rebalancing Revoke previously assigned partitions (Re-)joining group"

Perhaps a server or process crashed for whatever reason, and a consumer left/join the group which caused this log message. However, I expected the active consumer to continue from the last offset that was read. Due to the large gap in offsets, it seemed if though it took a while and it reset itself to the current (last) offset in the topic.

My question is how/why would a rebalance cause to lose the current offset?

The application has existed for a while but this is the first time it encountered such a load and also logged the rebalancing related log message. There will be future tests with similar load, but so far I haven't tried to reproduce the issue.

1 Answers1

0

Number of partions must be greater than or equal to number of consumers in single group id. This is strongly related to kafka offset system.

See this article.

https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/ch04.html

Especially contents figure 4-4 may describes your situation.

Youngrok Ko
  • 351
  • 1
  • 4
  • 10