6

Lets say I have two Kafka clusters and I am using mirror maker to mirror the topics from one cluster to another. I understand consumer has an embedded producer to commit offsets to __consumer-offset topic in Kafka cluster. I need to know what will happen if primary Kafka cluster goes down? Do we sync the __consumer-offset topic as well? Because secondary cluster could have different number of brokers and other settings, I think.

Please tell how Kafka mirrored cluster takes care of consumer offset?

Does auto.offset.reset setting play a role here?

Guido
  • 46,642
  • 28
  • 120
  • 174
Yogesh Gupta
  • 1,226
  • 1
  • 12
  • 23

1 Answers1

11

Update

Since Apache Kafka 2.7.0, MirrorMaker is able to replicate committed offsets. Cf https://cwiki.apache.org/confluence/display/KAFKA/KIP-545%3A+support+automated+consumer+offset+sync+across+clusters+in+MM+2.0

Original Answer

Mirror maker does not replicate offsets.

Furthermore, auto.offset.reset is completely unrelated to this, because it's a consumer setting that defines where a consumer should start reading for the case, that no valid committed offset is found at startup.

The reason for not mirroring offsets is basically, that they can be meaningless on the mirror cluster because it is not guaranteed, that the messages will have the same offsets in both cluster.

Thus, in fail over case, you need to figure out something "smart" by yourself. One way would be to remember the metadata timestamp of you last processed record. This allows you to "seek" based on timestamp on the mirror cluster to find an approximate offset there. (You will need Kafka 0.10 for this.)

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Thanks for taking time to respond. So what happens when primary cluster with committed offsets goes down? Kafka documentation says "consumer specifies offset in each request".. Will consumer start all over or mirrored kafka cluster will somehow remap the offset to a new appropriate offset ? Hope i made sense. – Yogesh Gupta Jan 31 '17 at 06:51
  • Not sure. I would assume that you need to restart your consumer (as you need to set new broker host/port -- or how do you handle failover to the other cluster?) and this restart will drop the offsets. Thus, as a workaround, you could periodically get the latest committed offset from you consumer in you application and using it in a seek() when initializing the new consumer after failover to the new cluster. However, this is not completely fault tolerate because if you consumer dies during failover, offset are completely lost -- you would need to store the reliably to guard agains this. – Matthias J. Sax Jan 31 '17 at 07:38
  • @moodylearner I did update my answer -- it was not completely correct before. – Matthias J. Sax Feb 01 '17 at 22:55