Application A sends traffic to B via TCP. B using multicast (UDP) sends traffic further to C. Applications on D get this data via TCP.
A -> (TCP) -> B -> (UDP) -> C -> (TCP) -> D
A - Windows Server
B - Linux VM on ESX host
C - Linux VM on another ESX host (no load - just one vm)
D - Windows clients
When B sends traffic to C multiple drops can be observed due to the nature of the traffic (bursts) happening on NIC (ixgbe) level (rx_missed_errors directly correspond to what application on C observes).
Increasing the size of ring buffer (on C) to a maximum value (4096) makes things even worse.
All connections are 10G, traffic levels do not exceed 2Gbit/s (during bursts) (checked using sar -n DEV 1
).
Questions:
How can I measure bursts with less than 1 second interval?
How come increasing the ring size can make things worse?
Is there a way of slowing down the traffic on B so it can be handled on C without drops at NIC level? (Traffic shaping, changing tcp window size/buffer size?)
Why if I replace B with a Windows host drops are not happening at all as if udp bursts are shaped in more digestible way.
How else could I approach, analyze this problem?
Thanks