As of today, our application has a health check endpoint that is used for liveness probe checks in k8s. Liberty is configured with 50 executor threads. If all the 50 executor threads are busy, the request would queue up and wait for an executor thread to be released.
At times, due to slowness with the database (or external integration end points), the transaction takes a while to complete. This will keep all the 50 executor threads busy and the health check request (from kubelet) also gets queued up. In such situations, the liveness probe fails (failureThreashold: 3
) and results in a container restart. Container restart further complicates the situation in such cases, as this would result in availability issues and the container takes 4m to 5m startup.
Is it a good idea to configure health check probes in a separate (dedicated) thread pool? This will eliminate executor thread contention issues and avoid unnecessary restarts. The downside is that real deadlock situations (that require a restart) will be missed. However, the health check API can be further enhanced to detect deadlocks (if its possible) and respond accordingly.