1

I am using python and confluent_kafka

I am building a Queue Management for Kafka where we can view the pending(uncommitted) messages of each topic, delete topic and purge topic. I am facing the following problems.

  • I am using same group ID for all the consumers so that I can get the uncommitted messages.
  • I have 2 consumers one (say consumer1) consuming and committing and another one (say consumer2) just consuming without committing.

If I run consumer1 and consumer2 simultaneously only one consumer will start consuming and another just keep on waiting and hence cause heavy loading time in the frontend.

If I assign different group Id for each it works but, the messages committed by consumer1 are still readable by consumer2.

Example: If I have pushed 100 messages and say consumer1 consumed 80 messages and when I try to consume from consumer2 it should consume only remaining 20, but it is consuming all 100 including the messages committed by consumer1.

How can I avoid that or solve?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Ibrahim Khan
  • 184
  • 10

1 Answers1

1

Unclear what you mean by uncommitted. Any message in a topic has been committed by a producer.

From the consumer perspective, this isn't possible. Active Kafka consumers in the same group cannot be assigned the same partitions

More specifically, how would "consumer2" know when/if "consumer1" was "done consuming 80 records" without consumer1 becoming inactive?

If you have an idle consumer with only two consumers in the same group, sounds like you only have one partition... If you want both to be active at the same time, you'll need multiple partitions, but that won't help with any "visualizations" unless you persist your consumed data in some central location. At which point, Kafka Connect might be a better solution than Python.

If you want to view consumer lag (how far behind a consumer is processing), then there are other tools to do this, such as Burrow with its REST API. Otherwise, you need to use the get_watermark_offsets() function to find the topic's offsets and compare to the current polled record offset

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • By **commit** I mean once consumer consumes the message it commit's the same so that same message can be ignored by other consumers of same group. I am trying building a Queue Management for Kafka for internal purpose. I am using **confluent_kafka** and **python**. I want to read and display all the unconsumed messages of each topics and all partitions. **Problem 1** if I use same group ID It does not read all the unconsumed messages of all the partitions of particular topic. **Problem 2** If I use different group ID, it shows all the messages irrespective of it is consumed or not. – Ibrahim Khan Mar 09 '23 at 12:55
  • You can set `auto.commit.offset=false`, so it doesn't commit when consumed. Other than that, did you have a question here? You don't need to repeat your post. 1) Yes, it will read uncommitted offsets, but only one consumer per partition is possible 2) You can use multiple consumers in one group using the `assign` function rather than `subscribe`, since assigning doesn't use consumer grouping, but that obviously won't help showing what offsets were committed – OneCricketeer Mar 09 '23 at 15:37