I have a quarkus service running in a K8s cluster.
Micro is currently running with Quarkus 3.1.1 (BUT also tested with 2.16.7).
The micro exposes a Reactive REST API with Rest Easy Reactive. Each call needs to access remotely to an LDAP server and sends back a JSOn response with data received from LDAP server.
Calls to the LDAP server are done using async API that allows to register a listener that is handled in a Connection Reader thread provided by LDAP API. The glue between Mutiny and LDAP API is done Creating a Uni from a CompetionStage that handles the LDAP asyncronous operation per HTTP request.
It works very nicely and performance is really great compared with a blocking solution implemented in SPring Boot usually
BUT
There is an issue when the micro servive is booting under heavy traffic (e.g. cold start under scale out). In this case, we observe some issues. Observations:
- Latencies are terrible (many seconds). Metrics in the LDAP server show that server is answering fast (5 msecs). It is true that JVM is using C1 , C2 more frequently when booting
- Depending on provided memory, the micro is OOM killed
- Readiness and liveness probes are timing out (default config for timeout is 10secs)
My best guess is that micro creates a hughe backlog of ongoing operations and struggles. A couple of questions:
- Is there any way for this kind implementation to limit the number of ongoing requets and reject extra load (I was playing with quarkus.vertx.queue-size, but I saw no difference)? Could you explain if there is any protection at Quarkus level to limit the pending job and what is the way to activate , configure it?
- Is there a way to prioritize critical calls (e.g. readiness / liveness probes)? Otherwise the micro is basically killed by K8s
In the imperative SPring Boot micro we had the possibility that made the boot more robust:
- Limit the pending job queue and discard messages when queue is full.
- Have a different queue for probes, so even when there is a hughe amount of pending job the probes can be managed
I tried several things to solve the issue. The only thing that really worked was to heavily increase the amount of resources (CPU and memory) to allow the service to boot properly. It was a waste of resources just after the first 3 minutes that service was booting properly.
Other ideas I have considered:
- Play with external balancers (e.g. Istio warmup that allows to gradually start traffic in a new POD)
- Implement a rate limit of ongoing requests (maybe using a filter with a semaphore), and reject when the limit is surpassed. I would allow that readiness, liveness is never rejected.
Thanks in advance,
Evaristo