1

The scenario is the following: a WebSocket server and clients exchange messages at some pace (like 40-50 times in a second). However, for one client, once in a few minutes I'm observing that there is a 5-second (always the same span) period, when TCP segments (containing client's WebSocket messages, as well as ACKs to server messages) from the client arrive in batches (one batch is accumulated for a few hundred ms, so there are kind of successive bursts of client messages), so the tcpdump output schematically looks like this:

...
t0      S > C <server-message-1>
t0 + 20 S > C <server-message-2>
...
t0 + 100 S > C <server-message-n>
t0 + 105 C > S <client-message-1>
t0 + 105 C > S <client-message-2>
t0 + 105 C > S <server-message-1-ack>
...
t0 + 105 C > S <client-message-m>
t0 + 105 C > S <server-message-n-ack>
...

Among other things, the Nagle's algorithm is disabled for the sockets, as well as net.ipv4.tcp_autocorking = 0. Tried various TCP congestion algorithms, doesn't seem to change the situation. How can I detect if this is problem on the server, or something is wrong on the client's side?

Any suggestions why this may happen and how to resolve or at least pinpoint the issue?

tonso
  • 111
  • 4

1 Answers1

1

Take a look at RSO and GSO as that will combine segments into larger frames that are then delivered to the kernel. That shouldn't have a 5 second delay, but that may be mis-configured.

Brennen Smith
  • 1,742
  • 8
  • 11
  • Thanks, will try. It's not actually a solid 5-second interval, but a series of bursts, each containing client segments, generated for up to a few hundred ms. – tonso Nov 11 '22 at 10:47
  • ^ that definitely sounds like GSO/RSO, try disabling it with `ethtool` and see if it improves the situation. It could either be on the receiver, or on the sender side. If you can access the network devices inline, you could take a `pcap` of the segments and see if the segments are combined in a single ethernet frame at that point. – Brennen Smith Nov 13 '22 at 20:54