7

Is there any way to pause or throttle a Kafka producer based on consumer lag or other consumer issues? Would the producer need to determine itself if there is consumer lag then perform throttling itself?

mrmannione
  • 749
  • 10
  • 29
  • 2
    Reason I would like to pause or throttle a producer is I don't want to loose events if the retention period is exceeded or the disk space is exceeded, so if I knew based on consumer info to pause then that would help. – mrmannione Jan 14 '19 at 13:28

3 Answers3

4

Kafka is build on Pub/Sub design. Producer publish the message to centralized topic. Multiple consumers can subscribe to that topic. Since multiple consumers are involve you cannot decide on producer speed. One consumer can be slow other can be fast. Also it is against the design principle otherwise both system will become tightly couple. If you have use case of throttling may be you should evaluate other framework like direct rest call.

Rishi Saraf
  • 1,644
  • 2
  • 14
  • 27
  • 1
    I am using Kafka for many reasons and moving away from a direct REST architecture, so telling me to go back to REST is really not answering my question. I assume the answer to my question is NO, what I want is not possible. – mrmannione Jan 14 '19 at 13:57
  • 2
    @mrmannione I actually answered your question and told you the reason why it can't be done. Last line was just a suggestion. If you don't want to take it ignore it gracefully :) – Rishi Saraf Jan 14 '19 at 15:24
  • Trouble is if one consumer is really slow then it may miss messages due to the disk size limit, in this scenario if the producer was aware and paused until it caught up or until it got fixed and scaled then that seems like a good feature to me. – mrmannione Jan 25 '19 at 12:46
  • To improve consumption rate you can do parallelization. Increase number of partition and have more consumers listening to these partitions. – Rishi Saraf Jan 25 '19 at 16:07
  • In my case multiple consumers can share the same backend which can have the same issue. Therefore I thought it would be good to have a way to pause producer when that happens. I know that I can pause the consumer which I can do but that is not what I am asking about. It seems what I am asking about is not possible – mrmannione Jan 28 '19 at 09:52
  • 1
    Yeah what you are asking is not possible. Usually people achieve it by increasing consumer throughput. Other idea can be increase ttl for your messages. That way message will remain in kafka broker , consumer can take it's own time producer can keep on producing on it's rate. But again it will not work if producer will keep on producing with same rate all the time. – Rishi Saraf Jan 28 '19 at 10:57
3

Producer and Consumer are decoupled.

Producer push data to Kafka topics (partitions topic), that are stored in Kafka Brokers. Producer doesn't know who and how often consume messages.

Consumer consume data from Brokers. Consumer doesn't know how many producers produce the messages. Even the same messages can be consumed by several consumers that are in different groups. In example some consumer can consume faster than the other.

You can read more about Producer and Consumer in Apache Kafka webpage

Bartosz Wardziński
  • 6,185
  • 1
  • 19
  • 30
  • 1
    In my scenario I don't want to loose events if the disk size is exceeded before a message is consumed, so if I was to pause the producer then any slow consumers can catch up. But I take it from your responses that what i want does not exist. – mrmannione Jan 14 '19 at 13:25
  • @mrmannione, Size of disk, can't be obtain in any way through Kafka API. To not exceed disk limit Kafka has retention time properties, that can be set at broker level or topic. Default value is 7 days, so if message is *old* it will be *delete* from topic and can't be consumed by any consumer. – Bartosz Wardziński Jan 15 '19 at 12:22
2

It is not possible to throttle the producer/producers weighing on performance of consumers.

In my scenario I don't want to loose events if the disk size is exceeded before a message is consumed

To tackle your issue, you have to depend on the parallelism offering by the Kafka. Your Kafka topic should have multiple partitions and producers has to use different keys to populate the topic. So your data will be distributed across multiple partitions and bringing a consumer group you can manage load within a group of consumers. All data within a partition can be processed in order, that may be relevant since you are dealing with event processing.

Steephen
  • 14,645
  • 7
  • 40
  • 47
  • 1
    I have an issue where even if I scale consumers, a number of those consumers share the same backend and that backend sometimes has issues, therefore I thought it would be good to have a way to pause producer when that happens. I know that I can pause the consumer which I can do but that is not what I am asking about. It seems what I am asking about is not possible. – mrmannione Jan 28 '19 at 09:51