How to read messages from kafka consumer group without consuming?

Question

I'm managing a kafka queue using a common consumer group across multiple machines. Now I also need to show the current content of the queue. How do I read only those messages within the group which haven't been read, yet making those messages again readable by other consumers in the group which actually processes those messages. Any help would be appreciated.

So you want to "peek" the queue in Kafka, by the sounds? – Craig Ringer Oct 24 '18 at 07:44 — Craig Ringer, Oct 24 '18 at 07:44

score 7 · Answer 1 · answered Jul 30 '18 at 22:34

In Kafka, the notion of "reading" messages from a topic and that of "consuming" them are the same thing. At a high level, the only thing that makes a "consumed" message unavailable to a consumer is that consumer setting its read offset to a value beyond that of the message in question. Thus, you can turn off the autocommit feature of your consumers and avoid committing offsets in cases where you'd like only to "read" but not to "consume".

A good proxy for getting "all messages which haven't been read" is to compare the latest committed offset to the highwater mark offset per partition. This provides a notion of "lag" that indicates how far behind a given consumer is in its consumption of a partition. The fetch_consumer_lag CLI function in pykafka is a good example of how to do this.

score 3 · Answer 2 · answered Jun 22 '18 at 08:17

3

In Kafka, a partition can be consumed by only one consumer in a group i.e. if your topic has 10 partitions and you spawned 20 consumers with same groupId, then only 10 will be connected to Kafka and remaining 10 will be sitting idle. A new consumer will be identified by Kafka only in case one of the existing consumer dies or does not poll from the topic.

AFAIK, I don't think you can do what I understand you want to do within a consumer group. You can obviously create another groupId and process message based on the information gathered by first consumer group.

answered Jun 22 '18 at 08:17

AbhishekN

368
4
8

Ok, so If I am understanding correctly, I can save the offsets from the first consumer group on each read and then use those offset values to read data from the second consumer group. – Priyam Singh Jun 22 '18 at 09:38
1

Yes that is doable, you can save offsets locally while committing them to Kafka and use them to re-read or move past those records. Its commonly used for checkpointing to recover properly from failures. – AbhishekN Jun 22 '18 at 18:44

Craig Ringer · Answer 3 · 2018-10-25T05:24:28.070

1

Kafka now has a KStream.peek() method

See proposal "Add KStream peek method".

It's not 100% clear to me from the docs that this prevents consuming of message that's peeked from the topic, but I can't see how you could use it in any crash-safe, robust way unless it does.

How to read messages from kafka consumer group without consuming?

4 Answers4