1

I’m trying to debug a low tcp transfer rate on a host (“lowHost”).
As a reference, i’m comparing it against a second host (“highHost”).

I’m making two rate measurements:

  1. Downloading a big file using curl from a webserver.
  2. Using iperf

lowHost goes through an extra network hop (“router”) compared to highHost.

lowHost  ---(router)----->(fw)----> internet ----> webserver
highHost ---------------->(fw)----> internet ----> webserver

“router” is a linux host which nat’s (iptables) and routes traffic between a group of network interfaces.
I’m trying to understand if this router is the cause of the low tcp rate.

My best guess is that the low rate is caused by spikes in latency, which restrict the tcp cwnd.

                         |  lowHost                     |  highHost
--------------------------------------------------------------------------------
curl rate                |  40Mbps rate                 |  400Mbps rate
curl wshark “rtt”        |  ~100 pkts with rtt>100ms    |  ~5 pkts with rtt>100ms
curl rwnd                |  3MB                         |  3MB
curl cwnd                |  0.2MB                       |  2.5MB            
--------------------------------------------------------------------------------
iperf tcp rate           |  100Mbps                     |  700Mbps
iperf tcp wshark “rtt”   |  0 pkts with rtt>100ms       |
iperf tcp cwnd           |  0.5MB                       |
--------------------------------------------------------------------------------
iperf udp rate           |  350Mbps                     |

ping stats:
    lowHost --> webserver
         79 packets transmitted, 79 received, 0% packet loss, time 78077ms
         rtt min/avg/max/mdev = 33.285/33.664/38.176/0.800 ms

    highHost --> webserver
         108 packets transmitted, 108 received, 0% packet loss, time 106949ms
         rtt min/avg/max/mdev = 32.684/32.855/39.706/0.689 ms

I’ve used systemtap, to check the duration of ip_forward call in the kernel, to see if it’s the source of the latency spikes.
When sampling over 60sec, the max ip_forward call duration is 8ms, not nearly enough to account for the latency seen in the wireshark “rtt”.

If it’s not the router, what else could it be?
If it is the router, and it’s not latency, what else could it be?
Are there any socket statistics that can point me to the issue (/bin/ss)?
why is the iperf rate double the curl rate?

wireshark throughput/rtt/rwnd/cwnd graphs:

Tomer
  • 13
  • 7
  • Hi, do you double NAT behing that second router or it's a second IP scoop ? – yagmoth555 Jun 29 '20 at 14:08
  • yes there is a double nat here. – Tomer Jun 29 '20 at 15:51
  • your latency is really high? if this really is a gigabit connection you have it's a really high ping RTT., How many miles aways is this server? Many ISP's will rate limit certain ports, to stop DDOS/DOS attacks. What is the remote web server doing ? are you the only user? is it under load? Monitor everything in the path, not only at the network level, but cpu/memory/disk io. MTR is also a great tool for seeing packet loss along the route. Do you have transparent HTTP proxies in the way, they can interfere with results. Iperf is a direct TCP connection, and could be avoid these proxies. – The Unix Janitor Jul 01 '20 at 17:18
  • @TheUnixJanitor distance = ~1000 miles. latency/RTT should affect low/highHost the same way (baring extra "router" latency). ISP rate limiting on either end should affect low/highHost the same way. webserver is serving a simple static file, i'm the only user, no cpu/mem load. webserver cpu/mem load should affect low/highHost the same way. no cpu/mem load on "router". no http proxies. – Tomer Jul 01 '20 at 19:15
  • are low hosts and high host identical, in OS version/CPU/Memory/network? – The Unix Janitor Jul 01 '20 at 21:12
  • can you try iperf with multiple stream -P ? – EchoMike444 Jul 02 '20 at 04:32
  • what i dont understand this is the burst for the first 5 secs ? are you sure that the router dont do some qos ? bandwitdh limitations ? – EchoMike444 Jul 02 '20 at 04:33
  • having a value of 100mbs for lowhost is strange ... – EchoMike444 Jul 02 '20 at 04:36
  • @TheUnixJanitor both host are the same os - ubuntu18, same cpu/mem/network. – Tomer Jul 02 '20 at 08:28
  • @EchoMike444 iperf3 -P4 looks the same as single stream. the rates i write are 10sec avg for iperf, and 30-60sec avg for curl. for curl the initial rate is higher and it drops quickly over the first few seconds (see link to wireshark throughput graphs). i'm fairly sure the router don't do any qos. – Tomer Jul 02 '20 at 08:57
  • what is the model name of router ? and version of os on it ? – EchoMike444 Jul 02 '20 at 17:30
  • it's actually not an off the shelf router, it's a ubuntu16 machine with a custom iptables/ip rule/ip route configuration. – Tomer Jul 02 '20 at 21:22

0 Answers0