There's a certain condition where I'm unable to receive a response from an HTTP request to a particular service: the request is sent through a load balancer and the response is too big to fit into a single packet. Under those conditions, a response is never received. Neither is an error. The server sends a TCP ACK in response to the packet containing the HTTP request, but nothing comes after that. The connection just hangs waiting for something to happen. The behavior is observed whether the request is sent via CURL, node's HTTP library, or Postman.
What could cause that behavior? How can I debug what is causing the issue?
- I have verified that the service behind the load balancer is receiving and responding to the HTTP request, but the response is getting lost somewhere.
- We have other services with a similar setup that don't exhibit this issue.
- We tried setting a different instance of the load balancer but it had the same problem.
In tshark on the server, I see a handful of [TCP Retransmission]
entries after it receives one of the problematic requests through the load balancer (this doesn't happen when the same request is received directly). This seems to indicate that something between the client and the server is misdirecting TCP traffic (highly implicates the load balancer) but I can't understand what would cause that to occur only when HTTP responses are split into multiple packets, and only by this particular service.
Conditions that cause the failure:
Direct | Loadbalancer | |
---|---|---|
Single Packet | ✅ | ✅ |
Multiple Packets | ✅ |
Network topology:
Client -> Load Balancer -> Service