1

I have a quarkus service running in a K8s cluster.

Micro is currently running with Quarkus 3.1.1 (BUT also tested with 2.16.7).

The micro exposes a Reactive REST API with Rest Easy Reactive. Each call needs to access remotely to an LDAP server and sends back a JSOn response with data received from LDAP server.

Calls to the LDAP server are done using async API that allows to register a listener that is handled in a Connection Reader thread provided by LDAP API. The glue between Mutiny and LDAP API is done Creating a Uni from a CompetionStage that handles the LDAP asyncronous operation per HTTP request.

It works very nicely and performance is really great compared with a blocking solution implemented in SPring Boot usually

BUT

There is an issue when the micro servive is booting under heavy traffic (e.g. cold start under scale out). In this case, we observe some issues. Observations:

  • Latencies are terrible (many seconds). Metrics in the LDAP server show that server is answering fast (5 msecs). It is true that JVM is using C1 , C2 more frequently when booting
  • Depending on provided memory, the micro is OOM killed
  • Readiness and liveness probes are timing out (default config for timeout is 10secs)

My best guess is that micro creates a hughe backlog of ongoing operations and struggles. A couple of questions:

  • Is there any way for this kind implementation to limit the number of ongoing requets and reject extra load (I was playing with quarkus.vertx.queue-size, but I saw no difference)? Could you explain if there is any protection at Quarkus level to limit the pending job and what is the way to activate , configure it?
  • Is there a way to prioritize critical calls (e.g. readiness / liveness probes)? Otherwise the micro is basically killed by K8s

In the imperative SPring Boot micro we had the possibility that made the boot more robust:

  • Limit the pending job queue and discard messages when queue is full.
  • Have a different queue for probes, so even when there is a hughe amount of pending job the probes can be managed

I tried several things to solve the issue. The only thing that really worked was to heavily increase the amount of resources (CPU and memory) to allow the service to boot properly. It was a waste of resources just after the first 3 minutes that service was booting properly.

Other ideas I have considered:

  • Play with external balancers (e.g. Istio warmup that allows to gradually start traffic in a new POD)
  • Implement a rate limit of ongoing requests (maybe using a filter with a semaphore), and reject when the limit is surpassed. I would allow that readiness, liveness is never rejected.

Thanks in advance,

Evaristo

Cheva
  • 21
  • 1
  • 1
    I believe you need some form of load shedding. It could be rate limiting or concurrency limiting, or something completely custom. SmallRye Fault Tolerance, which is the common fault tolerance library in Quarkus (see https://quarkus.io/guides/smallrye-fault-tolerance and https://smallrye.io/docs/smallrye-fault-tolerance/6.2.3/index.html) implements a statically defined concurrency limit and a few common forms of rate limiting, so you could use that (disclaimer: I'm a maintainer, so I'm biased :-) ). – Ladicek Jun 14 '23 at 07:34
  • 1
    Have you analyzed the memory dump from the OOM to see what is causing it? – geoand Jun 14 '23 at 08:15
  • Thanks for the answers. Ladicek, you are right!!!! Your suggestion to use small-rye library is great for my use case. I tested annotating the REST endpoint with @Bulkhead and this is doing the job to reject over a certain concurrency level. The micro has several REST endpoints. Not sure if there is a way to consolidate the concurrency level, or the granularity is per endpoint. geoand - I have the OOM analysis still pending. For the moment I increased resource in the POD and added extra space for non-heap memory – Cheva Jun 14 '23 at 14:11

0 Answers0