-1

I have a use case where I am iterating over a list and for each element I am doing a task.

I can not do this synchronously because the code is part of a Kafka consumer and the run time can not be large.

listElem.size() - 50k

sample code

for(int i : listElem){
    doTask(i);      
}

Two ways I can think of

  1. to create child thread for every doTask(i) - but it will creating 50k child thread, will that be OK, as there will be lot of context switch.
  2. to create a message queue - but consumer resides in same application - and it will create a thread to read messages.

Please tell me the cons for each way and if there is any other better way.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Kakarot
  • 195
  • 2
  • 10
  • Do you mean that each Kafka message requires 50k tasks to be performed? – Tim Moore Aug 27 '23 at 07:11
  • @TimMoore the doTask is being called 50k times 50k messages will be pushed to queue. – Kakarot Aug 27 '23 at 08:48
  • `max.poll.records` consumer config defaults to 500. So, did you change that? Also, how are you handling offset commits if processing will be asynchronous? E.g. What if element 50k of the list finishes/commits before the rest and then your consumer group rebalanced before any other thread finished? – OneCricketeer Aug 27 '23 at 13:25
  • I have not changed the max.poll.records. should I increase it? also, each element in 50k is independent does not matter 50kth element is finished before or not. I did not understand the second point completely, can you please elaborate that? – Kakarot Aug 27 '23 at 16:59

1 Answers1

0

to create child thread for every doTask(i) - but it will creating 50k child thread, will that be ok, as there will be lot of context switch.

Create more threads to handle tasks concurrently should absolutely help to increase the performance. But the thing is you need to decide how many threads to be created. It is not always more is better as thread is not lightweight. It is common that a thread will take up at least 1MB memory. So creating 50k child thread means may roughly need 50GB memory which will cause your JVM running out of memory. The ideal number of thread depends on the nature of the task , whether it is I/O intensive or CPU intensive. You can refer to this for the idea of how to determine the ideal number.

to create a message queue - but consumer resides in same application - and it will create a thread to read messages.

If it is finally still the same application to consume the messages, no performance benefit as it just introduce more overhead to move the task to the queue to then poll it by other to process which needs to execute more codes for a task that can actually be processed.

It only has benefit in term of code design which can decouple the consumer that receive the message from the objects that process the messages.

Ken Chan
  • 84,777
  • 26
  • 143
  • 172
  • 1- I can not create 50k threads - as 50k threads will overload the JVM. 2- queue will add overhead no performance benefit. then what should be idle way to handle this scenario. – Kakarot Aug 27 '23 at 08:49
  • i am not asking you to create 50k thread. I am asking you to refer to the article that is shared with you to determine the ideal number of thread to be created – Ken Chan Aug 27 '23 at 08:51
  • OK, suppose I create 10k threads at a time. now next 40k will be waiting state, which is also increasing the end-to-end latency. – Kakarot Aug 27 '23 at 08:53
  • plesase measure your app. performance and tune the thread pool size. my gut feeling is that 10k is still too much – Ken Chan Aug 27 '23 at 09:05
  • I have one question isn't creating too many threads degrade the performance, as those threads will be doing a lot of context switch. – Kakarot Aug 27 '23 at 17:02