I have been debugging a scenario where an ESX vmware host is communicating with a virtual linux machine via a virtual switch using TCP/IP. After a congestion event the vmware host is waiting for an ack from the virtual machine while the virtual machine is waiting for more data from the vmware host. Large receive offload (LRO) is turned on, and the problem seems to be that the vmware driver is not giving the kernel the size of the subsegments which it is putting together (which it can do via gso_size field of the sk_buff structure).
Using generic receive offload (GRO) instead of LRO makes the problem go away. So, I have two possible solutions here,
1) fix the vmware driver so that it sets the gso_size field of the sk_buff passed to the kernel or 2) turn off LRO (using ethtool -K), and use GRO instead.
Searching on the web for information about LRO and GRO, I'm finding only snippets and opinions with no hard data or definitive references. I want to know what are the pros and cons of using LRO vs GRO.
From my searches on the web so far I believe that: *) Both LRO and GRO could decrease the number of acks, which should reduce network traffic but presumably could also reduce the speed that the congestion window (cwnd) grows during slow start or congestion avoidence. *) Both LRO and GRO should reduce the number of interrupts and the number of times that the kernel stack is traversed. Does GRO reduce interrupts more than LRO, since it uses the new API (NAPI)? *) LRO can put together too many packets sometimes (particular packets with disimilar headers, breaking certain applications). *) LRO only does IPV-4 whereas GRO can do IPV-6 as well.