2

I have started my producer to send data to Kafka and also started my consumer to pull the same data.When I was using Consumekafka processor (kafka version 1.0) in Apache Nifi, I have few queries in my mind which are related to Kafka consumer.

Q.1) When I start my ConsumeKafka processor at first time, how can I read messages from beginning and current messages?

Q.2) And also how to read messages after the last consumed message in case of consumer shutdown in Kafka?

How can we implement above two while using Apache Nifi?

halfer
  • 19,824
  • 17
  • 99
  • 186
Ashok Kuramdasu
  • 313
  • 4
  • 15

1 Answers1

1

The ConsumeKafka processors have a property called "Offset Reset" which is used when there is no previous offset for the consumer group id, or when the offset no longer exists. The choices for this property are "Offset Latest" or "Offset Earliest", and defaults to latest.

So if you start a ConsumeKafka processor using a consumer group id that has never been used before, then it starts consuming from the latest messages. After that if you start and stop the processor it starts from the offset that it last consumed.

If you want to utilize the "Offset Reset" again to force it to earliest or latest then you need to change the consumer group id, because otherwise the existing consumer group will always use the existing offset to start from.

You can't simultaneously read messages from beginning and current, you can either start at the beginning and read all the way to current, or start at current. This is the way Kafka works and is not specific to NiFi.

Bryan Bende
  • 18,320
  • 1
  • 28
  • 39
  • Thanks. Can we set "Offset Reset" option to "Earliest" every time?. So in that case there is no data loss i mean historical data and current data, am i correct?. – Ashok Kuramdasu Jul 31 '18 at 08:33
  • You only set Offset Reset the first time you ever start the processor, after that it is always reading from earliest to newest and picking up where it left off, so no data loss – Bryan Bende Jul 31 '18 at 14:44
  • Ok Thanks. Is there any drawbacks if i use "Offset Reset" option to "Earliest"? And why by default this option is set to "Latest" (Any specific reason)? – Ashok Kuramdasu Aug 01 '18 at 07:04
  • No drawback, for the default behavior it comes down to what is best when someone first starts the processor against an existing topic that may have millions of messages in it, if the default was earliest the user may not realize they are about to consume millions of messages – Bryan Bende Aug 01 '18 at 12:29
  • Thank you got it. – Ashok Kuramdasu Aug 01 '18 at 13:46