boot @Async : what is best way to create 1000 no of threads using spring

Question

I need to process 600 million of records in multithreaded way and each request takes 5-6 seconds. In boot application i need to create 1000 threads but tomcat supports 200 only. what is the best way to proceed?

Other than parallelizing, every second optimized in that request is 600 million work-seconds saved. — Rogue, Dec 20 '19 at 15:56
Question lacks any details to give any relevant answer . Please edit and give more details. — Indraneel Bende, Dec 20 '19 at 16:08
Why do you think you'll need 1000 threads. Adding more threads will not necessarily speed up things (on the contrary: it might slow things down). — Mark Rotteveel, Dec 22 '19 at 08:46

Robert Moskal · Answer 1 · 2019-12-21T14:39:37.657

You can totally control the number of threads Tomcat creates in /apache-tomcat/conf/server.xml:

<connector connectiontimeout="20000"
           maxthreads="1000"
           port="8080"
           protocol="HTTP/1.1"
           redirectport="8443" />

You can do this up to your OS limit for threads. It's 2000 on a mac.

But I think creating 1000 threads isn't going to help you very much. Loosely, you can only execute as many simultaneous threads as you have cores on your machine.

So with a 4 core machine it'll take ~24 years to process your 600 million records. With 32 cores you will get it down to a single digit number of years.

What would I do? I would look into something like Apache Beam that will parallelize your workflow across many, many machines. Take a look at https://cloud.google.com/dataflow/. You can create your task to requisition 1000 4 core machines. google will spin them up and tear them down for you. The job would take about 9 days. Back of the envelope calculation shows that getting your answer will cost you about $8,640

score 1 · Answer 2 · answered Dec 20 '19 at 16:06

If you want to stay efficient you most likely don't want to use 1000 threads unless your machine has 1000 CPUs. If your tasks are CPU bound then then the number of worker threads should be close to CPUs count otherwise you will waste cycles on CPU Scheduling.

Since your question lacks any technical details I'd suggest to close it. Write a new one explaining the basics of your problem:

How are you receiving requests? Over HTTP? LAN or WAN? Can it be changed to something else e.g. because request data is generated from an external database.
How are you processing the requests? Is it CPU bound calculation or are you making fan out requests to other systems to enrich the data.
How are you saving the processing results?
How do you plan to handle failures? If one request processing fails do you plan to repeat 600 mln requests?

actually i am calling soap (http request) and response time is 3-5 seconds, then the response is analysed and making another rest call to store data in google cloud. — Udayan, Dec 22 '19 at 16:07

score 0 · Answer 3 · answered Dec 20 '19 at 15:57

0

If Spring usage is must you can checkout Spring Cloud Data Flow instead of Apache Beam.

If you want to accomplish this by only using Tomcat & Spring Boot you must have to scale up the number of instances. Scaling up will provide more cores, and may not be the best way to do it.

Also I would suggest to use Tomcat with NIO, which will increase performance.

answered Dec 20 '19 at 15:57

rv.comm

675
1
7
10

ok, i will check this as you have suggested. – Udayan Dec 22 '19 at 16:09
let me explain a bit. i am reading excel sheet and based on data in sheet i am creating payload and calling soap, then after getting response, i am checking response and some part of response data storing in google cloud. one thread can execute the process in 5 sec. – Udayan Dec 22 '19 at 17:34
@Udayan based on what you said I would suggest this [https://cloud.google.com/functions/use-cases/real-time-data-processing] as the optimal way to do it. This way you do not have to maintain the server, container...you just have to maintain the Cloud Function (your SOAP call and storing it to google cloud). – rv.comm Dec 23 '19 at 13:03

GeertPt · Answer 4 · 2019-12-28T21:52:36.990

0

What happens in those 5-6 seconds? Does it do a computation using CPU, or is it sending data to somewhere else and waiting for it to return?

In the second case, you don't need to spin up 1000 threads to do 1000 queries in parallel, but you can use @Async if the other backend supports it. You would have only a small pool of input and output threads.

You can use Spring WebFlux for that. WebFlux does not use tomcat, however, but a custom HTTP server built on Netty, see e.g. https://www.baeldung.com/spring-webflux.

This can only work if you can execute each step in a reactive way. In your case, do a SOAP call use the reactive WebClient to send the data without blocking, and subscribe a second non-blocking process on the SOAP response to upload the data to google cloud.

edited Dec 28 '19 at 21:52

answered Dec 20 '19 at 16:05

GeertPt

16,398
2
37
61

hi Grey, i am calling soap and response coming in 3-4 seconds, then i am manipulating the response and storing in google cloud. – Udayan Dec 22 '19 at 16:30
@Udayan your use case is perfect for WebFlux, assuming you can do the upload to google cloud using a non-blocking call, too. – GeertPt Dec 28 '19 at 21:53

boot @Async : what is best way to create 1000 no of threads using spring

4 Answers4