6

I'm managing a kafka queue using a common consumer group across multiple machines. Now I also need to show the current content of the queue. How do I read only those messages within the group which haven't been read, yet making those messages again readable by other consumers in the group which actually processes those messages. Any help would be appreciated.

Priyam Singh
  • 816
  • 2
  • 11
  • 10

4 Answers4

7

In Kafka, the notion of "reading" messages from a topic and that of "consuming" them are the same thing. At a high level, the only thing that makes a "consumed" message unavailable to a consumer is that consumer setting its read offset to a value beyond that of the message in question. Thus, you can turn off the autocommit feature of your consumers and avoid committing offsets in cases where you'd like only to "read" but not to "consume".

A good proxy for getting "all messages which haven't been read" is to compare the latest committed offset to the highwater mark offset per partition. This provides a notion of "lag" that indicates how far behind a given consumer is in its consumption of a partition. The fetch_consumer_lag CLI function in pykafka is a good example of how to do this.

Emmett Butler
  • 5,969
  • 2
  • 29
  • 47
3

In Kafka, a partition can be consumed by only one consumer in a group i.e. if your topic has 10 partitions and you spawned 20 consumers with same groupId, then only 10 will be connected to Kafka and remaining 10 will be sitting idle. A new consumer will be identified by Kafka only in case one of the existing consumer dies or does not poll from the topic.

AFAIK, I don't think you can do what I understand you want to do within a consumer group. You can obviously create another groupId and process message based on the information gathered by first consumer group.

AbhishekN
  • 368
  • 4
  • 8
  • Ok, so If I am understanding correctly, I can save the offsets from the first consumer group on each read and then use those offset values to read data from the second consumer group. – Priyam Singh Jun 22 '18 at 09:38
  • 1
    Yes that is doable, you can save offsets locally while committing them to Kafka and use them to re-read or move past those records. Its commonly used for checkpointing to recover properly from failures. – AbhishekN Jun 22 '18 at 18:44
1

Kafka now has a KStream.peek() method

See proposal "Add KStream peek method".

It's not 100% clear to me from the docs that this prevents consuming of message that's peeked from the topic, but I can't see how you could use it in any crash-safe, robust way unless it does.

See also:

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
-2

I think that you can use publish-subscribe model. Then each consumer has own offset and could consume all messages for itself.

Marko Novakovic
  • 470
  • 2
  • 10