For some versions of tc, if you do not specify a buffer count limit, tc will default to 1000 buffers.
You can check how many buffers tc is currently using by running:
tc -s qdisc ls dev <device>
For example on my system, where I’ve simulated a 0.1s delay on the eth0 interface I get:
$ tc -s qdisc ls dev eth0
qdisc netem 8024: root refcnt 2 limit 1000 delay 0.1s
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
This shows that I have limit 1000
buffers available to fill during my 0.1s
delay period. If I go over this many buffers in my delay timeframe, the system will start dropping packets. Thus this means I have a packet per second (pps) limit of:
pps = buffers / delay
pps = 1000 / 0.1
pps = 10000
If I go beyond this limit, the system will be forced to either drop the incoming packet right away or replace a queued packet, dropping it instead.
Since we don’t normally think of network flows in pps, it’s useful to convert from pps to Bps, KBps, or GBps. This can be done by multiplying by either the network MTU (generally 1500 bytes), the buffer size (varies by system), or ideally by the observed average number of bytes per packet seen by your system on the given interface. Since we don’t know the average bytes per packet, or buffer size of your system at the moment, we’ll fallback to using the typical MTU.
byte rate = pps * bytes per packet
byte rate = 10000pps * 1500 bytes per packet
byte rate = 15000000 Bytes per second
byte rate = 15 MBps
If we are talking about a loopback interface that normally runs at an average of say ~5 Gbps, such as what iperf3 reports for the loopback interface on this MacBook, we can see the problem right away, in that our tc limit of 1.5 MBps is far less than the interface’s practical limit of ~5 GBps.
So if we were transferring a 1GB file over the loopback interface of this system, it should take:
time = file size / byte rate
time = 1Gb / 5GBps
time = 0.2 seconds
To transfer the file across the loopback interface.And the loss, assuming packet size matches buffer size, would be:
packets lost = packets - ((packets that fit in buffers) + (drain rate of buffers * timeframe))
packets lost = (file size / MTU) - ((buffer count) + (drain rate * timeframe))
packets lost = (1 GB / 1500 bytes) - ((10000) + (10000Hz * 0.2 seconds))
packets lost = 654667
And that’s out of:
packets = (file size / MTU)
packets = (1 GB / 1500 bytes)
packets = 666667
So in all that would be a loss percentage of:
loss % = 100 * (lost) / (total)
loss % = 100 * 654667 / 666667
loss % = 98.2%
Which happens to be roughly in line with what you are seeing.
So why didn’t increasing the system buffer size impact your losses? After all the buffer size is part of the computation.
The answer there, is that the method you are using to transmit your file is likely chunking according to it’s best guess at the MTU (likely 1500 bytes), and the packets only make use of the first 1500 bytes of your extra large buffers.
Thus the solution should probably be to increase the number of buffers available to tc instead of increasing the system buffer size. But how many buffers do you need for this link? Based off of this answer the recommendation is to use 150% of the expected number of packets for your delay, so that’s:
buffers = (network rate / avg packet size) * delay * 150%
buffers = (5GBps / 1500B) * 0.1s * 150%
buffers = 333000 * 150%
buffers = 500000
You can see right away that that’s 500 times as many buffers as tc tries to use by default, or to put it another way you only had 2% of the buffers you needed so you saw 98% loss.
Thus to fix your problem, try changing your tc command from something like:
sudo tc qdisc add dev <device> root netem delay 0.1s
To something like:
sudo tc qdisc add dev <device> root netem delay 0.1s limit 500000