2

I have a Spring-boot service, using Undertow, the primary clients of which are sensors at a client site (~250 such devices). These sensors send POSTs to the service every 10 seconds over the site WiFi - which is somewhat spotty in places. I am tracking the service in NewRelic and see occasional request-response times that are HOURS in length (typical response times are a few dozen millis). There is no processing on the service's controller - all payloads are cached off-thread and forwarded via a separate process. After about 15 hours or so, the service stops responding and needs to be restarted. I suspect these long-running requests are saturating the pool of threads used to handle requests from other sensors. NewRelic suggests that all errors encountered are much like the following:

I/O error while reading input message; nested exception is java.io.IOException: 
UT000128: Remote peer closed connection before all data could be read

A high percentage of these errors have messages suggesting Exceptions in the Spring-boot JSON processor that complain of invalid/unexpected characters or closed inputs.

It seems as if some of the sensors are struggling to complete their POSTs. Is this a fair interpretation?

Is there a way that I can force my service to 'kill' these requests before they eat up all of my handler threads? I'm aware that a client-side circuit-breaker might be the best way to handle this, but I don't have a lot of control over that end of things just yet.

I'm also not wedded to Undertow as a Servlet container - Tomcat or Jetty would be just fine with me, if it makes skinning this cat a bit easier.

I have the following code in a @Configuration class:

@Bean
public ServletWebServerFactory servletWebServerFactory() {
    UndertowServletWebServerFactory factory = new UndertowServletWebServerFactory(contextPath, serverPort);

    factory.addBuilderCustomizers((builder) -> {
        ...
        builder.setServerOption(UndertowOptions.IDLE_TIMEOUT, 60000);
        ...
    });
    return factory;
}

But it does not seem to kill off the requests.

1 Answers1

0

This error happen at below scenario(B side).

A --call--> B

always a high traffic happened in B service will caused the error.

So the work arround here

  1. higher pool settings

server.undertow.worker-threads 300

  1. limit max-connections of undowtow, reject can avoid more loss and protect your service.

high_water: default 1000000 low_water: default 1000000

suiwenfeng
  • 1,865
  • 1
  • 25
  • 32