I've got a simple client server set up where it seems like TCP packets I'm sending from the client are not arriving at the server.
Normally everything works fine, but when I spin up 50 threads on the client to hit the server "simultaneously" with the same small data packet (which is only 39 bytes), a random number of times the server is not receiving all bytes. Even stranger, is that it is very consistent in how it doesn't receive them... only 5 bytes are received.
I'm using tcpdump and tcpflow to capture what is going on at both ends (if not familiar with tcp flow, it removes the massive amount of TCP SYN/ACK/FIN/etc noise from the TCP stream and just shows you data sent in either direction).
On the client side, for 50 threads firing off the 39 byte packet, it looks perfect. Specifically, tcpflow (which uses libpcap) shows me 50 identical data transfers:
07 B6 00 01 | 00 1E 00 00 | <etc>
As I understand it, libpcap/tcpdump get data from a pretty low level (below the TCP stack) so I take this to mean that the data was sent ok, or at least was not stuck in the kernel buffers.
However, when looking at the server side, all is not perfect. A random number are failing, and it is a high percentage. For example, out of the 50 socket connections, 30 will work fine, but for 20 of them I have a protocol failure where the server's socket.recv
times out waiting for bytes (the protocol indicates exact packet length).
It is VERY consistent in how it fails. For the 30/20 case, 30 of the sockets perfectly receive the transmitted 39 bytes. The remaining 20 ALL receive this partial data, after which my socket.recv
times out:
07 B6 00 01 | 00
Only 5 bytes are arriving for each of the 20 connections, and it seems to be at the kernel level since tcpdump is only showing 5 bytes arriving as well.
How can this happen?
This 5-byte boundary is not 100% coincidence. It is the first part of a header, and the 34 byte payload comes next, but is not arriving. On the client side it is split like this.
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
sock.sendall(HEADER) # 5 bytes
sock.sendall(PAYLOAD) #34 bytes
and both sock.sendall
calls complete successfully in every thread, as is proven my the tcp logging shows that all 50 runs send 39 bytes "out the door" perfectly.
Any ideas on the root cause of this? What am I missing?