What are the pros and cons of large receive offload (LRO) vs generic receive offload (GRO)

Question

I have been debugging a scenario where an ESX vmware host is communicating with a virtual linux machine via a virtual switch using TCP/IP. After a congestion event the vmware host is waiting for an ack from the virtual machine while the virtual machine is waiting for more data from the vmware host. Large receive offload (LRO) is turned on, and the problem seems to be that the vmware driver is not giving the kernel the size of the subsegments which it is putting together (which it can do via gso_size field of the sk_buff structure).

Using generic receive offload (GRO) instead of LRO makes the problem go away. So, I have two possible solutions here,

1) fix the vmware driver so that it sets the gso_size field of the sk_buff passed to the kernel or 2) turn off LRO (using ethtool -K), and use GRO instead.

Searching on the web for information about LRO and GRO, I'm finding only snippets and opinions with no hard data or definitive references. I want to know what are the pros and cons of using LRO vs GRO.

From my searches on the web so far I believe that: *) Both LRO and GRO could decrease the number of acks, which should reduce network traffic but presumably could also reduce the speed that the congestion window (cwnd) grows during slow start or congestion avoidence. *) Both LRO and GRO should reduce the number of interrupts and the number of times that the kernel stack is traversed. Does GRO reduce interrupts more than LRO, since it uses the new API (NAPI)? *) LRO can put together too many packets sometimes (particular packets with disimilar headers, breaking certain applications). *) LRO only does IPV-4 whereas GRO can do IPV-6 as well.

score 2 · Answer 1 · answered Sep 30 '16 at 23:08

According to this old LWN article GRO was essentially intended to replace LRO. As I understand it LRO is much more aggressive, which can lead to packets being combined in a lossy fashion (discarding important header data), whereas GRO is more restrictive.

In particular LRO seems to be known to problematic in environments with software bridging and/or forwarding, which is common in virtualisation setups.

Disabling these offloads is probably one of the first things you should do when debugging weird performance issues.

What are the pros and cons of large receive offload (LRO) vs generic receive offload (GRO)

1 Answers1