4

I have a component, deployed in an EC2-instance on AWS, which will fail 'randomly' (~70% of the request) when sending a larger payload (40k).

Sending the request is done using Postman. Sending the same request over and over again (with same delay in between or just as quickly as i can), every time i get a few failures, then a success (sometimes 2 in a row, then some failures, repeat)

It's a Java Spring Boot application, the controller-snippet:

    @PostMapping
    @RequestMapping("/some/url")
    ResponseEntity<MyClass> methodName(@RequestBody String data, @RequestHeader("content-length") String header) {

        log.info("Content-Length header was: "+header);
        log.info("Length of inputJson (@RequestBody) was: "+data.length());
        log.info(inputJson);

For every request (failed or successful) i get the same value for the content-length-header. The data-length is either the same (successful call) or shorter (failed call).

I added some logging as above, which seems to prove that the data-received is actually truncated.

I enabled access-log using

server.tomcat.accesslog.pattern=%h %l %u %t "%r" %s %b %{content-length}i %{Content-Length}i

The access-log shows the same value as the content-length-header.

Looking around I found several articles, none with an answer that i could use. Considering this also succeeds 30% (or with smaller requests) i figure the code itself is probably fine. Just out-of-ideas on what to change next to investigate this further.

Some related links which did not help my problem (which might help others):

Bas E
  • 51
  • 1
  • 4
  • Thank you! It took me ages to find this but it describes my issue to a T. Embedded Tomcat inconsistently drops the POST data. – Matt Lachman Jul 21 '22 at 17:56

2 Answers2

8

In our case, we faced a similar issue where requests were getting truncated randomly(~ 10-20%). The same application version was deployed across multiple environments. The issue was present in only some environments. We tried to reproduce the issue by doing the following:

  • Changing the application API endpoint from HTTPS to HTTP, there wasn't any issue. We also changed the backend protocol of the ALB from HTTPS to HTTP.
  • Tried hitting the HTTPS endpoint from the localhost. The issue was not present.

Later on we checked the application server that was in place. We were using embedded tomcat in the spring boot application which was 9.0.31. This version of tomcat is known to have issues with payload over HTTPS. Changing the tomcat version to 9.0.30 by explicitly defining this version in the pom.xml resolved the issue for us.

Related links so that this may help others:

Mohammad Mirzaeyan
  • 845
  • 3
  • 11
  • 30
1

For those having a similar problem, the fix that helped us with this was a configuration change made by AWS. They used bigger nodes for the AWS Application LoadBalancer. Rolling that change back re-introduced the issue.

This is a config-change we could not do ourselves. AWS is still figuring out the root-cause.

One pointer that helped to reproduce / pinpoint:

  • when making a (curl-)request from the EC2-machine to itself, the request always passed.
  • when making a (curl-)request from another EC2-machine towards the exposed endpoints (and by doing so hitting the LoadBalancer) the failures occurred again.
  • This helped to proof the issue was within the AWS-stack, not a code-issue, not a network-issue on my client/network.
Bas E
  • 51
  • 1
  • 4