Why isn't increasing the networking buffer sizes reducing packet drops?

Question

Running Ubuntu 18.04.4 LTS

I have a high-bandwidth file transfer application (UDP) that i'm testing locally using the loopback interface.

With no simulated latency, I can transfer a 1GB file at maximum speed with <1% packet loss. To achieve this, I had to increase the networking buffer sizes from ~200KB to 8MB:

sudo sysctl -w net.core.rmem_max=8388608
sudo sysctl -w net.core.wmem_max=8388608
sudo sysctl -p

For additional testing, I wanted to add a simulated latency of 100ms. This is intended to simulate propagation delay, not queuing delay. I accomplished this using the Linux traffic control (tc) tool:

sudo tc qdisc add dev lo root netem delay 100ms

After adding the latency, packet loss for the 1GB transfer at maximum speed went from <1% to ~97%. In a real network, latency caused by propagation delay shouldn't cause packet loss, so I think the issue is that to simulate latency the kernel would have to store packets in RAM while applying the delay. Since my buffers were only set to 8MB, it made sense that a significant amount of packets would be dropped if simulated latency was added.

I increased my buffer sizes to 50MB:

sudo sysctl -w net.core.rmem_max=52428800
sudo sysctl -w net.core.wmem_max=52428800
sudo sysctl -p

However, there was no noticeable reduction in packet loss. I also attempted 1GB buffer sizes with similar results (my system has >90GB of RAM available).

Why did increasing system network buffer sizes not work in this case?

"_ I think the issue is that to simulate latency the kernel would have to store packets in RAM while applying the delay._" No. UDP has no feedback, and it simply sends as fast as possible, so a congested interface will simply drop packets. — Ron Maupin, Jun 06 '20 at 06:01
How would tc simulate latency in that case? If 100ms of latency is being simulated locally, I don't see any way around the packets being held in the kernel's network buffers for longer, causing the buffers to quickly fill up. — tyler124, Jun 06 '20 at 11:13
Hi, it's been a month.. Any update on this topic? What about my answer and suggestions? — bsaverino, Jul 07 '20 at 20:36

RlonRyan · Answer 1 · 2022-12-06T06:02:37.237

For some versions of tc, if you do not specify a buffer count limit, tc will default to 1000 buffers. You can check how many buffers tc is currently using by running:

tc -s qdisc ls dev <device>

For example on my system, where I’ve simulated a 0.1s delay on the eth0 interface I get:

$ tc -s qdisc ls dev eth0
qdisc netem 8024: root refcnt 2 limit 1000 delay 0.1s
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
    backlog 0b 0p requeues 0

This shows that I have limit 1000 buffers available to fill during my 0.1s delay period. If I go over this many buffers in my delay timeframe, the system will start dropping packets. Thus this means I have a packet per second (pps) limit of:

pps = buffers / delay
pps = 1000 / 0.1
pps = 10000

If I go beyond this limit, the system will be forced to either drop the incoming packet right away or replace a queued packet, dropping it instead.

Since we don’t normally think of network flows in pps, it’s useful to convert from pps to Bps, KBps, or GBps. This can be done by multiplying by either the network MTU (generally 1500 bytes), the buffer size (varies by system), or ideally by the observed average number of bytes per packet seen by your system on the given interface. Since we don’t know the average bytes per packet, or buffer size of your system at the moment, we’ll fallback to using the typical MTU.

byte rate = pps * bytes per packet
byte rate = 10000pps * 1500 bytes per packet
byte rate = 15000000 Bytes per second
byte rate = 15 MBps

If we are talking about a loopback interface that normally runs at an average of say ~5 Gbps, such as what iperf3 reports for the loopback interface on this MacBook, we can see the problem right away, in that our tc limit of 1.5 MBps is far less than the interface’s practical limit of ~5 GBps.

So if we were transferring a 1GB file over the loopback interface of this system, it should take:

time = file size / byte rate
time = 1Gb / 5GBps
time = 0.2 seconds

To transfer the file across the loopback interface.And the loss, assuming packet size matches buffer size, would be:

packets lost = packets - ((packets that fit in buffers) + (drain rate of buffers * timeframe))
packets lost = (file size / MTU) - ((buffer count) + (drain rate * timeframe))
packets lost = (1 GB / 1500 bytes) - ((10000) + (10000Hz * 0.2 seconds))
packets lost = 654667

And that’s out of:

packets = (file size / MTU)
packets = (1 GB / 1500 bytes)
packets = 666667

So in all that would be a loss percentage of:

loss % = 100 * (lost) / (total)
loss % = 100 * 654667 / 666667
loss % = 98.2%

Which happens to be roughly in line with what you are seeing.

So why didn’t increasing the system buffer size impact your losses? After all the buffer size is part of the computation.

The answer there, is that the method you are using to transmit your file is likely chunking according to it’s best guess at the MTU (likely 1500 bytes), and the packets only make use of the first 1500 bytes of your extra large buffers.

Thus the solution should probably be to increase the number of buffers available to tc instead of increasing the system buffer size. But how many buffers do you need for this link? Based off of this answer the recommendation is to use 150% of the expected number of packets for your delay, so that’s:

buffers = (network rate / avg packet size) * delay * 150%
buffers = (5GBps / 1500B) * 0.1s * 150%
buffers = 333000 * 150%
buffers = 500000

You can see right away that that’s 500 times as many buffers as tc tries to use by default, or to put it another way you only had 2% of the buffers you needed so you saw 98% loss.

Thus to fix your problem, try changing your tc command from something like:

sudo tc qdisc add dev <device> root netem delay 0.1s

To something like:

sudo tc qdisc add dev <device> root netem delay 0.1s limit 500000

bsaverino · Answer 2 · 2022-07-20T23:17:21.517

1

To my knowledge, even though its not what you are trying to achieve.. you should probably throtlle up the speed at which you are sending UDP packets because indeed as pointed out by @user3878723 buffers will quickly fill up and packets will be lost. Said differently - quite like @Ron Maupin - when applying delay the interface gets congested. I don't think the emitting process is aware of the 100ms delay so it might overwhelm all available resources quickly.

Instead you may have to tweak something like a Token Bucket Filter (TBF) if you want to go farther in your very use case. Also consider "Rate control".

UPDATE

It could be worth modifying these parameters and make them persistent

net.core.rmem_default
net.core.wmem_default

And/Or make sure you are using correctly these options in your emitter/receiver:

SO_SNDBUF
SO_RCVBUF

So that the whole chain has enough buffer.

edited Jul 20 '22 at 23:17

answered Jun 12 '20 at 22:02

bsaverino

1,221
9
14

Thanks for the response. The sending process purposely sends UDP packets as fast as possible and does not attempt rate control. I'm attempting to simulate propagation delay latency, which to simulate accurately I assumed would entail ensuring my system's buffer sizes are sufficient to ensure packets don't get dropped while the simulator is applying the delay. – tyler124 Jun 13 '20 at 03:34
I updated my answer. It woud be interesting to verify these parameters and restart by tranferring a small payload first and increase again incrementally. Introducing a 100ms delay may indeed have more impact than we think, especially if you are trying to send a very large file at very high speed. In the worst case if the emitter is not throttling and you decide to introduce a 100x delay (if base was - say - 1ms) then your buffers could need to be 100x higher...? – bsaverino Jun 14 '20 at 14:19

Why isn't increasing the networking buffer sizes reducing packet drops?

2 Answers2