0

I know about configuring kafka to read from earliest or latest message. How do we include an additional option in case I need to read from a previous offset? The reason I need to do this is that the earlier messages which were read need to be processed again due to some mistake in the processing logic earlier.

SHILPA AR
  • 332
  • 2
  • 7
  • 18

2 Answers2

0

In java kafka client, there is some methods about kafka consumer which could be used to specified next consume position.

public void seek(TopicPartition partition, long offset)

Overrides the fetch offsets that the consumer will use on the next poll(timeout). If this API is invoked for the same partition more than once, the latest offset will be used on the next poll(). Note that you may lose data if this API is arbitrarily used in the middle of consumption, to reset the fetch offsets

This is enough, and there are also seekToBeginning and seekToEnd.

GuangshengZuo
  • 4,447
  • 21
  • 27
  • If there are 3 partitions and the latest offsets are 12,13 and 15,in case we want to read all the messages since a particular timestamp, how do we acccomplish that? – SHILPA AR Aug 05 '17 at 12:04
  • Cannot read message since a timestamp, there are just offset. you can read all message and then process the message which you want to If the message contains timestamp value. – GuangshengZuo Aug 05 '17 at 13:25
  • You mean to say, read each message and compare it inside my script with the timestamp I am looking for? – SHILPA AR Aug 08 '17 at 09:09
  • yes, Kafka does not support this function, you need to do it by writing code. – GuangshengZuo Aug 08 '17 at 10:25
  • 1
    In Kafka 0.11 you can get offsets for timestamps in the Java client. See https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/consumer/Consumer.html#offsetsForTimes(java.util.Map). You can also use the administrative script bin/kafka-consumer-groups --reset-offsets to externally change the offsets stored in the Kafka consumer offsets topic. No need to use zookeeper for offset storage anymore either (since 0.9) – Hans Jespersen Aug 31 '17 at 05:46
0

I'm trying to answer a similar but not quite the same question so let's see if my information may help you.

First, I have been working from this other SO question/answer

In short, you want to commit your offsets and the most common solution for that is ZooKeeper. So if your consumer encounters an error or needs to shut down, it can resume where it left off.

Myself I'm working with a high volume stream that is extremely large and my consumer (for a test) needs to start from the very tail each time. The documentation indicates I must use KafkaConsumer seek to declare my starting point.

I'll try to update my findings here once they are successful and reliable. For sure this is a solved problem.

J Mac
  • 21
  • 4
  • Most common place to store offsets since 0.9 is in Kafka itself (in the __consumer_offsets topic). Zookeeper us only used for offsets in the old consumer API. – Hans Jespersen Aug 31 '17 at 05:48