Kubernetes throttling JVM application that isn't hitting CPU quota

Question

I am running a Kotlin Spring Boot based service in a Kubernetes cluster that connects to a PostgreSQL database. Each request takes around 3-5 database calls which partially run in parallel via Kotlin coroutines (with a threadpool backed coroutine context present).

No matter the configuration this services gets throttled heavily after getting hit by real traffic after just starting up. This slowness sometimes persists for 2-3 minutes and often only affects some fresh pods, but not all.

I am looking for new avenues to analyze the problem - here's a succinct list of circumstances / stuff I am already doing:

The usual response time of my service is around 7-20ms while serving 300-400 requests / second per pod
New / autoscaled instances warmup themselfes by doing 15000 HTTP requests against themselfs. The readiness probe is not "up" before this process finishes
We are currently setting a cpu request and limit of 2000m, changing this to 3000m does reduce the issue but the latency still spikes to around 300-400ms which is not acceptable (at most 100ms would be great, 50ms ideal)
The memory is set to 2gb, changing this to 3gb has no significant impact
The pods are allocating 200-300mb/s during peak load, the GC activity does not seem abnormal to me
Switching between GCs (G1 and ZGC) has no impact
We are experiencing pod throttling of around 25-50% (calculated via Kubernetes metrics) while the pod CPU usage is around 40-50%
New pods struggle to take 200-300 requests / sec even though we warm up, curiously enough some pods suffer for long periods. All external factors have been analyzed and disabling most baggage has no impact (this includes testing with disabled tracing, metric collection, disabling Kafka integration and verifying our database load is not maxing out - it's sitting at around 20-30% CPU usage while network and memory usage are way lower)
The throttling is observed in custom load tests which replicates the warmup requests described above
Connecting with visualvm during the load tests and checking the CPU time spent yields no striking issues
This is all done on a managed kubernetes by AWS
All the nodes in our cluster are of the same type (c5.2xlarge of AWS)

Any tools / avenues to investigate are appreciated - thank you! I am still puzzled why my service is getting throttled although its CPU usage is way below 100%. Our nodes are also not affected by the old kernel cfs bug from before kernel 5.6 (not entirely sure in which version it got fixed, we are very recent on our nodes kernel version though).

Maybe this is related to https://stackoverflow.com/questions/54099425/pod-cpu-throttling?rq=1, but I can't seem to grasp why this hits my service this much. I am also fairly high on the request & limit 3000m CPU already. — roookeee, Aug 24 '22 at 14:57
300 TPS is 18000 requests per minute, which is significant. One thing to note is that you [should not use cpu limits](https://home.robusta.dev/blog/stop-using-cpu-limits/). Spring being slow before warmup is a [common problem](https://stackoverflow.com/q/54759652/1032785). — jordanm, Aug 24 '22 at 15:19
I will dig into the "not using limits" but we are explicitly doing warmup (15000 requests) before setting our readiness to up. JVM compiler thresholds are around 10000 method invocations so 15k should suffice for this. Raising this warmup call count does not alleviate the issue. The monolith system metrics are showing 50-60k requests per minute at peak so 18k is not enough (although one pod reaches 18k alone, multiple pods are enough, the latency is just bad when scaling). The biggest issue we face is huge surges that tripple traffic over a time period of less than 10 minutes. — roookeee, Aug 24 '22 at 15:32
Further analysis has shown that only specific pods (mostly exactly 1) suffer tremendous slowdown when receiving equal load. This is a very irritating issue as all of our nodes are of equal hardware without any "noisy neighbour" problems. If I will ever fix this I will post an update (if I forgot, just link me here). It will probably take some time though as all metrics between a "good" and "bad" pods are nearly indistinguishable although one pod performs 10x worse, sigh. — roookeee, Aug 26 '22 at 08:20
Have you tried profiling? [async-profiler](https://github.com/jvm-profiling-tools/async-profiler/) with `jfr` output used to help me in finding a cause of cpu throttling. — apangin, Aug 26 '22 at 16:13
I only have rudementary knowledge of profiling with `visualvm` which has not shown obvious issues (but to be honest: the whole reactive, coroutines + async intra-request execution makes it very hard to track the cpu time of a single request). Will probably have to dig deeper so I will give async-profiler a shot (heard about it before). Its just bizarre that only one pod / instance is affected no happer how often we deploy or what node we get, really strange. Thank you for the idea! — roookeee, Aug 27 '22 at 15:23
We have isolated the issue to only some instances being affected. Metrics are showing that our main thread pool has longer queues on the affected instances: A "healthy" instance has a max queue of 5-10 items while an unhealthy instance has 500-700 items, which means every new queued task has a latency of latency(current queue) + latency(new task) => quite high for 500-700 tasks. Now why is this? We don't know yet, the hardware is equal, all instances are not even close to maxing their CPU and aren't throttled - it's quite frustrating to analyze :) Any help is appreciated — roookeee, Sep 03 '22 at 21:28
@flore77 yes, thank you for reminding me. I will post an answer soon (TM) — roookeee, Oct 10 '22 at 08:11

score 0 · Accepted Answer · answered Oct 11 '22 at 13:22

In the end this all boiled down to missing one part of the equation: I/O bounds. Imagine if one request takes 10 DB calls, each taking 3 milliseconds to fulfill (including network latency etc.). A single request then takes 10*3 = 30 milliseconds of I/O. The request throughput of one request is then 1000ms / 30ms = 33,33 requests / second. Now if one service instance uses 10 threads to handle requests we get 333,3 requests / seconds as our upper bound of throughput. We can't get any faster than this because we are I/O bottlenecked in regards to our thread count.

And this leaves out multiple factors like:

thread pool size vs. db connection pool size
our service doing non-db related tasks (actual logic, json serialization when the response get fulfilled)
database capacity (was not an issue for us)

TL;DR: You can't get faster when you are I/O bottlenecked, no matter much how CPU you provide. I/O has to be improve if you want your single service instance to have more throughput, this is mostly done by db connection pool sizing in relation to thread pool sizing in relation to db calls per request. We missed this basic (and well known) relation between resources!

Kubernetes throttling JVM application that isn't hitting CPU quota

1 Answers1