15

I have a spring boot application which I'm running inside docker containers in an openshift cluster. In steady state, there are N instances of the application (say N=5) and requests are load balanced to these N instances. Everything runs fine and response time is low (~5ms with total throughput of ~60k).

Whenever I add a new instance, response time goes up briefly (upto ~70ms) and then comes back to normal.

Is there anything I can do to avoid this type of cold start? I tried pre-warming the app by making ~100 curl calls sequentially before sending traffic, but that did not help?

Do I need better warmup script with high concurrency? Is there a better way to handle this?

Thanks

Aritz
  • 30,971
  • 16
  • 136
  • 217
Vikk
  • 3,353
  • 3
  • 22
  • 24
  • This sounds more like an OpenShift configuration problem. – chrylis -cautiouslyoptimistic- Feb 19 '19 at 06:18
  • @chrylis Could you please elaborate? – Vikk Feb 19 '19 at 06:27
  • @Vikk, you should elaborate the question itself too. What does your application do? I would personally try to deploy a sample spring boot app and perform a test with it. If the same issue happens, then it's openshift related. – Aritz Feb 21 '19 at 08:32
  • 2
    It could be JVM/JIT warm-up time. I would guess lazy class loading - for which making a bunch of curl calls is a good start - or JIT optimations for which calling the critical code about 100 times isn't enough (default: `-xx:CompileThreshold=10000` afaik). _Or_, depending on what your application is actually doing, it could be necessary to warm up cache – qutax Feb 21 '19 at 09:07
  • Are you using pre-deployed images? What's the memory consumption per instance? Can you increase the overall allocated memory per node and test? – hovanessyan Feb 22 '19 at 18:09
  • @Vikk All of your spring boot assets must be getting lazily loaded. Finding out the request that is taking the most time using a reverse proxy like Nginx and then maybe try to fire those particular curls. Also, if you have a distributed tracing mechanism like jaeger/zipkin in place, that will also help a lot. – Mukul Bansal Feb 24 '19 at 06:19
  • @qutax -xx:CompileThreshold defaults to 1500 I think. – Mukul Bansal Feb 24 '19 at 06:21
  • What kind of calls are you making , have you checked your Keep-Alive configuration of your client calls or any other caching that is present in between that might cause the responses to be returned faster. – fatcook Feb 26 '19 at 05:22

6 Answers6

4

we faced a similar issue with our microservices , in order to warmup we added a Component

ApplicationStartup implements ApplicationListener<ApplicationReadyEvent> 

within the application to make a call to the services just after application startup,this worked for us. with this solution it is guarentted that all class that will be used in your payload will be loaded just after the startup of the instance in every instance you start and you dont need external script to make calls . Also problem with external script we can not say for sure that calls where handled by the new instance .

@Component
public class ApplicationStartup implements ApplicationListener<ApplicationReadyEvent> {

    @Autowired
    YourService yourService;

    @Override
    public void onApplicationEvent(final ApplicationReadyEvent event) {

        System.out.println("ApplicationReadyEvent: application is up");
        try {
            // some code to call yourservice with property driven or constant inputs 
        } catch (Exception e) {
            e.printStackTrace();
        }
    }


} 
satyesht
  • 499
  • 7
  • 19
  • 1
    I think the best time to warm up the jvm is the event: **ContextRefreshedEvent**, if you use event above. your microservice may receive http request before you finish warming up the jvm. – user3033075 Jan 15 '22 at 10:08
2

If your application is healthy when you serve a request to it, but you still have a problem with slow response, you should try to enable Tiered Compilation

-XX:CompileThreshold -XX:TieredCompilation

Normally, the VM uses the interpreter to collect profiling information on methods that are fed into the compiler. In the tiered scheme, in addition to the interpreter, the client compiler is used to generate compiled versions of methods that collect profiling information about themselves.

Since compiled code is substantially faster than interpreted code, the program executes with better performance during the profiling phase.

Serge
  • 2,574
  • 18
  • 26
2

This problem can be solved from two aspects. The first method is to warm up yourself before serving. The second method is to give fewer requests from the outside at the beginning, so that more computing resources can be reserved to complete some initialization of the JVM (such as class loading). Either way, it is because the JVM needs to be warmed up for startup. This is determined by the operating principle of the JVM. Especially the HotSpot virtual machine, its execution engine consists of two parts: the interpretation execution engine and the real-time compilation execution (JIT). For JIT, which requires CPU resources to compile the bytecode in real time. In addition, lazy loading of classes will also require more time on the first run.

  1. JVM warm-up

JVM warm-up is mainly to solve the two problems of class loading and real-time compilation.

  • For class loading, just run the override code path ahead of time.
  • For JIT, layered compilation such as C1/C2 is generally enabled on the server side (JVM server mode). If you are using JDK7 or above, layered compilation is enabled by default (lower versions of JDK require JVM parameters: -XX:+TieredCompilation), the compilation computing resources of C1 and C2 are different, and C2 will have more. The purpose of preheating may be to trigger C1/C2 compilation, so that when the official request comes in, the code has been preheated and compiled pass.

For the above two directions, class loading itself will consume more time. Warming up this part will get a greater input-output ratio.

  1. Network layer warm-up.

From the network level, a certain amount of warm-up traffic is given, which can be a specific warm-up traffic or a normal user request.

This can generally be done at the nginx layer for flow control. When a newly started node joins the upstream, a very low weight can be given to the new node. In this way, only a small amount of traffic is entered in the initial stage. Thus, enough computing resources are reserved. To do code warm-up, that is, class loading and just-in-time compilation. If the service only provides RPC services, but not HTTP services, the RPC framework layer can do traffic preheating. For example, RPC frameworks such as Dubbo already have the service preheating function. Similarly, preheating means that the nodes in the initial stage of startup only give a small amount of traffic .

The computing resources required for preheating are mentioned in the above methods. That is, CPU. If your service host has enough computing resources, you can allocate more CPU resources to each node to speed up the preheating process. Reduce Preheat treatment time.

If the above network layer and hardware resources and RPC framework cannot be changed. We can warm up ourselves inside the SpringBoot microservice. The above answers have already mentioned ApplicationReadyEvent, the actual better implementation is to listen to the ContextRefreshedEvent event. Because the HTTP port will be initialized and exposed when the ApplicationReadyEvent occurs. There may be unexpected requests coming in before the warm-up is completed.

@Component
public class StartWarmUpListener implements ApplicationListener<ContextRefreshedEvent> {
    /**
     * Handle an application event.
     *
     * @param event the event to respond to
     */
    @Override
    public void onApplicationEvent(ContextRefreshedEvent event) {
        // do something about warm-up here.....
    }
}

Note: The above warm-up code does not warm up all the code. Because the request from the Controller layer has some code paths that cannot actually be executed until the HTTP server is not ready. We can only perform code coverage at the service layer. In short, this may be a compromise.

user3033075
  • 139
  • 1
  • 6
1

In my scenario, I mock 100+ curl requests to initialize those client pools, preloading caches, or other lazy loading guys.

I do this work at WarmupHealthIndicator implements HealthIndicator which implement a spring actuator health check endpoint.

finally it works, before the warmup finshed, any healch check from Nginx(or other Load Balancer) will get 5xx status code, and below body message. And after status up, no traffic will cost time on app initialization.

{
  "status": "DOWN",
  "details": {
    "warmup": {
      "status": "DOWN"
    },
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 536608768000,
        "free": 395195826176,
        "threshold": 10485760
      }
    }
  }
}

In addition, NGINX Plus have a paid feature slow_start which can do the samething for your interest.

suiwenfeng
  • 1,865
  • 1
  • 25
  • 32
0

First of all, I'd try to enable JIT compilation and compare results. There's a good article in Baeldung comparing Graal performance with the default C1 and C2 JIT compilers -- you may want to run some tests against your workload. Basically, you need to set the following options when running your Java application:

-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler

Also, make sure you've configured readiness probe in OpenShift using Spring Boot's actuator health check URL (/actuator/health). Otherwise your container may receive traffic before being ready to serve.

A readiness probe determines if a container is ready to service requests. If the readiness probe fails a container, the endpoints controller ensures the container has its IP address removed from the endpoints of all services. A readiness probe can be used to signal to the endpoints controller that even though a container is running, it should not receive any traffic from a proxy. Set a readiness check by configuring the template.spec.containers.readinessprobe stanza of a pod configuration.

Finally, having your responses being cached by NGINX or some other reverse proxy also helps.

Fabio Manzano
  • 2,847
  • 1
  • 11
  • 23
-1

When a Spring Boot application starting, the JVM needs to load various classes for initialization, resulting in a long response time for HTTP requests.

If you want to warm up http components, you can refer to:

@Component
public class WarmUpListener implements ApplicationListener<ApplicationReadyEvent> {
    @Override
    public void onApplicationEvent(ApplicationReadyEvent event) {
        // Warm up
    }
}

Or try this spring boot starter: warmup-spring-boot-starter, HTTP-related components can be preheated before the application provides external services, thereby reducing the HTTP request response time.

shelltea
  • 9
  • 3