0

I have 6 partitions for a certain topic, and 4 consumers consuming from that topic. The producer produces in a round robin manner to the partitions. The 4 consumers belong in the same consumer group.

I can see with some load testing that 2 of the partitions are getting consumed very slowly, while the others are almost always empty. I would like to increase my throughput as much as possible.

  1. What will be the the the default partition assignment strategy from kafka?
  2. If the load increases at some time I would like to scale my consumers up to 6 (same number as partitions so it is a 1-1 consumer to partition). In the 4 consumers scenario to achieve the best possible throughput should I limit my producer to produce only to 4 partitions until I have increased the number of my consumers?
thepaulbot
  • 25
  • 6
  • Unfortunately, there is no purpose of scaling consumers beyond the partition size of the topic in Kafka. So in this case you can scale max 6 independent consumers for max throughput. For uniform load distribution into partitions, the key of the message plays the role while publishing to topic and it can be based on your use-case as well such as ordering of messages/aggregation. – Valath Jun 16 '22 at 01:29

2 Answers2

1

Which kafka version are you using?

It seems your producers are not using efficient method for partitioning.

  • Check if similar keys are being generated by Producers? Or if it is generating null keys?

You can write custom partition with efficient hash algo which distribute messages equally and give fair chance to consumers to consume the message in parallel

abs
  • 35
  • 7
  • I am using kafka with the latest docker image of wurstmeister/kafka which I believe is the 2.8.1 version. I have only one producer and I am not attaching a key to my messages. And I am certain that messages are produced to the partitions in round robin. Should I not be using round robin for producing? Why should I use a producer key for this scenario? – thepaulbot Jun 13 '22 at 15:12
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – dmotta Jun 19 '22 at 23:23
1

Many factors contribute the overall performance of the Clients (Producer/Consumer) connected to a KAFKA Broker. First of all, I am not sure how you are running your consumer instances, whether 4 instances running on 4 separate servers or 4 instances through any IDE tool for loading test per se. You can better clarify here. Also, how is your consumer implementation look like. Is it just reading from the topic and writing it into a console or doing full blown business functionality connected to any of the backend systems. Kindly confirm.

Default Partitioner:

If a key exists and the default partitioner is used, Kafka will hash the key and use the result to map the message to a specific partition. The mapping of keys to partitions is consistent only as long as the number of partitions in a topic does not change.

You can change this behaviour implementing a Customer Partitioner

Dynamic consumers:

You can't increase the consumers dynamically based on the throughput, unless you have a multi-threaded consumers implemented. You can read more about Java Executor Service ref: https://dzone.com/articles/kafka-consumer-and-multi-threading. Your consumer implementation must be having something as follows. So you should have a counter of number of records polled, and if it is more than the threshold you are after then you can instantiate the ExecutorService to add up more instances.

private List executors = new ArrayList() ;

@Override
public void run(String... args) throws Exception {
    Runtime.getRuntime().addShutdownHook(new Thread() {
        @Override
        public void run() {
            executors.forEach(exe -> {
                exe.shutdown();
                try {
                    if (!exe.awaitTermination(10000, TimeUnit.MILLISECONDS)) {
                        exe.shutdownNow();
                    }
                } catch (InterruptedException e) {
                        exe.shutdownNow();
                        }

                        int instances = <<number of instances>>;
                        ExecutorService executor = Executors.newFixedThreadPool(instances);
                        for (int i=0; i < instances; i++) {
    executor.execute(<<Consumer Implemenation class>>);
    executors.add(executor);
    }
    }
ChristDist
  • 572
  • 2
  • 8
  • I am using kubernetes and each consumer is a seperate pod/server. My problem is not how or when to scale consumers. My problem is that with a round-robin producing my consumer-throughput is low in some cases. Specifically with 6 partitions when I have 4 consumers the following happens: 2 consumers will be assigned 2 partitions. 2 consumers will be assigned 1 partition. The consumers with 2 partitions each will have to consume twice the number of messages. The same problem is with 5 consumers. With 6 consumers there is 1-1 consumer to partition relationship so the parallelism is optimal. – thepaulbot Jun 14 '22 at 09:43