How to achieve Spring boot p99 under 500 milliseconds for 1000 requests per second

Question

Our spring boot application verion 2.5.10 has just a single api which reads from redis 6 times and does some calculations and there are no other IO operations such as database operations and is deployed in kubernetes with helm. Actual api execution time is under 20 milliseconds which I calculated using Instant.now(). The current K6 load testing output p99 value is under 3 seconds when load testing from m4.4xlarge instance at 1000 VUs.

I have already tried below tomcat configurations using application properties file.

server.tomcat.threads.min-spare=100
server.tomcat.threads.max=400
server.tomcat.accept-count=200

Initially deployment was with just 0.5 cpu in kubernetes. Once the requirements were shared we increased cpu to 2000m and memory to 1Gb and increased replicaset to 3.

While load testing I noticed the runnable thread counts is low. Can I make any changes to increase the runnable count or is it okay?

jvm_threads_states_threads{state="runnable",} 5.0
jvm_threads_states_threads{state="blocked",} 0.0
jvm_threads_states_threads{state="waiting",} 202.0
jvm_threads_states_threads{state="timed-waiting",} 3.0
jvm_threads_states_threads{state="new",} 0.0
jvm_threads_states_threads{state="terminated",} 0.0

What possible optimizations or configurations can be made to further improve the p99 latency to under 500ms?

score 0 · Accepted Answer · answered Jul 19 '23 at 21:07

0

Runnable is probably not the problem. You need to figure out what going on with this

jvm_threads_states_threads{state="waiting",} 202.0

You have 202 threads waiting on something. From the java docs

"A thread that is waiting indefinitely for another thread to perform a particular action is in this state"

You have something creating a bottleneck. As an example it could be your threads waiting on IO from Redis. Note that Redis depending on the client you are using can use connection pooling etc so that's where I'd start. You need to add more detail to your question.

answered Jul 19 '23 at 21:07

Harry

11,298
1
29
43

We are using JedisClientConfigurationBuilder in JedisConnectionFactory. Also we are using GenericObjectPoolConfig. One thing was suggested is to let the tests run over 10 to 20 minutes and we did get p99 under 1 seconds. Earlier we ran the tests only for 1 minute. However the runnable thread did not increase while the test was running. I increased the redis connection pool poolMinIdle to 2 where earlier it was 1 and max is 15. – Umang Desai Jul 20 '23 at 10:57
When benchmarking anything in Java you need to account for the just in time compiler kicking in and this does not happen early. Java is difficult to benchmark correctly. Have a look at JMH and the associated docs to see how involved it can get. – Harry Jul 20 '23 at 18:40
Thanks @Harry. The problem for us was that JMeter was not able to send the required requests per second and we were running the load test for only 1 minute. It was stuck at 10-15% of 1000 rps. We used K6 and we were able to reach ~750 rps for cross region. When tried from the same region and when we ran the test for 10 minutes the p99 value was within desired limits and the rps was 1000. So when you said "You have 202 threads waiting on something", it was because it was not receiving enough requests to trigger the use of those threads. – Umang Desai Jul 26 '23 at 11:59

How to achieve Spring boot p99 under 500 milliseconds for 1000 requests per second

1 Answers1