0

I am probably missing the point of the Kafka Consumer but what I want to do is:

Consumer subscribes to a topic, grabs all messages within the topic and returns a Future with a list of all of those messages

The code I have written to try and accomplish this is

val sink = Sink.fold[List[KafkaMessage], KafkaMessage](List[KafkaMessage]()) { (list, kafkaMessage) =>
list :+ kafkaMessage
}

def consume(topic: String) =
Consumer.committableSource(consumerSettings, Subscriptions.topics(topic))
  .map { message =>
    logger.info(s"Consuming ${message.record.value}")
    KafkaMessage(Some(message.record.key()), Some(message.record.value()))
  }
  .buffer(bufferSize, overflowStrategy)
  .runWith(sink)

The Future never returns though, it consumes the necessary messages and then continues to poll the topic repeatedly. Is there a way to return the Future and then close the consumer?

1 Answers1

2

As Kafka is for streaming data, there is no such thing as "all messages" as new data can be appended to a topic at any point.

I guess, there are two possible things you could do:

  1. check how many records got returned by the last poll and terminate or
  2. you would need to get "current end of log" via endOffsets, and compare this to the offset of the latest record per partition. If both match, then you can return.

The first approach is simpler, but might have the disadvantage, that it's not as reliable as the second approach. Theoretically, a poll could return zero records, even if there are records available (even if the chances are not very high that this happens).

Not sure, how to express this termination condition in Scala though (as I am not very familiar with Scala).

ouzture
  • 76
  • 1
  • 12
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • does poll return the record with end offset ? I thought end offset was the next offset you would see and can't be polled if there is no new message. did you mean compare latest record offset to endoffset - 1 for match ? – s7vr Oct 19 '19 at 01:24
  • You are right. The `endOffset` is the offset of the last message plus one. – Matthias J. Sax Oct 19 '19 at 04:47
  • Thanks Matthias. If I may I’ve a use case where I need to poll until end offset is reached for a topic partition ( has both transaction and non transaction message). I’m trying to write a if condition when to stop polling. For topics with only non transactional messages I can compare the current offset to endoffset - 1 and stop polling. What would be same condition for topics with both kind of messages ? If you are interested I have a bounty on the similar question. https://stackoverflow.com/questions/58339639/spring-kafka-consume-last-n-messages-for-partitionss-for-any-topic. – s7vr Oct 19 '19 at 12:50
  • I’m not looking for code just a general guidelines about how to go about polling & processing last N records for any partitions. I’m trying to come up with an efficient way to process messages. Thank you in advance. – s7vr Oct 19 '19 at 12:52
  • For transactional data, this a not really possible to achieve atm, because transactional marker "fill up" offsets but are not exposed to the application. It's knows issue (eg, https://issues.apache.org/jira/browse/KAFKA-6607) – Matthias J. Sax Oct 19 '19 at 23:53