1

I am using CppKafka to programming Kafka consumer. I want when my consumer starts, it will only poll new arrival messages (i.e message arrive after the consumer-start-time) instead of messages at consumer offset.

// Construct the configuration
Configuration config = {
    { "metadata.broker.list", "127.0.0.1:9092" },
    { "group.id", "1" },
    // Disable auto commit
    { "enable.auto.commit", false },
    // Set offest to latest to receive latest message when consumer start working
    { "auto.offset.reset", "latest" },
};

// Create the consumer
Consumer consumer(config);

consumer.set_assignment_callback([](TopicPartitionList& partitions) {
    cout << "Got assigned: " << partitions << endl;
});

// Print the revoked partitions on revocation
consumer.set_revocation_callback([](const TopicPartitionList& partitions) {
    cout << "Got revoked: " << partitions << endl;
});


string topic_name = "test_topic";
// Subscribe to the topic
consumer.subscribe({ topic_name });

As I understand, the configuration auto.offset.reset set to latest only works if the consumer has no commited offset when it starts reading assigned partition. So my guess here that I should call consumer.poll() without commit, but it feels wrong and I am afraid i will break something along the way. Can anyone show me the right way to achieve my requirement?

Anh Tuan
  • 1,728
  • 1
  • 13
  • 25

2 Answers2

2

If "enable.auto.commit" is set as false and you do not commit offsets in your code, then every time your consumers starts it starts message consumption from the first message in the topic if auto.offset.reset=earliest.

The default for auto.offset.reset is “latest,” which means that lacking a valid offset, the consumer will start reading from the newest records (records that were written after the consumer started running).

Based on your question above it looks like auto.offset.reset=latest should solve your problem.

But if you need a real time based offset you need to apply the time filter in your consumer. That means get the message from the topic compare offset time with either on some custom field in message payload or the meta attribute of the message (ConsumerRecord.timestamp())and do further processing accordingly.

Also refer to this answer Retrieve Timestamp based data from Kafka

asolanki
  • 1,333
  • 11
  • 18
  • If my consumer doesn't commit, it will start at the first message no matter how i set `auto.offset.reset` configuration? – Anh Tuan May 18 '18 at 06:48
  • Modified my answer to make it more clear basically “latest,” means that lacking a valid offset, the consumer will start reading from the newest records (records that were written after the consumer started running). The alternative is “earliest,” which means that lacking a valid offset, the consumer will read all the data in the partition, starting from the very beginning. – asolanki May 18 '18 at 08:54
  • Thank you very much. I will try some different approaches (include your suggestion) to see which work best. Another option is in the callback of assigned partitions, i can manually set offset in that partition to the end of partition. – Anh Tuan May 21 '18 at 04:47
2

use seekToEnd(Collection partitions) method. Seek to the last offset for each of the given partitions. This function evaluates lazily,seeking to the final offset in all partitions only when poll(long) is called. If no partitions are provided, seek to the final offset for all of the currently assigned partitions.

Vivek Rai
  • 19
  • 4