5

I have a really strange networking problem. The actual network configuration is quite complex, because I am using Openstack and Docker to build a virtual network. However, the problem is not there, because I am capturing on my host's interface and I see all the packet in the right way.... But for some reasons I do not know, it seems that TCP is ignoring them, though they have been received: it doesn't send ACK for them and it doesn't send the data to the application.

In my trials, I sent HTTP GET request for an html page to a server jetty (IP 192.168.4.3) from an host (192.168.4.100).

What I see capturing on 192.168.4.100 with Wireshark is:

192.168.4.100 -> SYN -> 192.168.4.3
192.168.4.3 -> SYN, ACK -> 192.168.4.100
192.168.4.100 -> ACK -> 192.168.4.3

192.168.4.100 -> GET / HTTP/1.1 -> 192.168.4.3
192.168.4.3 -> ACK -> 192.168.4.100
192.168.4.3 -> Fragment 1 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Fragment 2 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Fragment 3 of HTTP 200 OK response (PSH) -> 192.168.4.100

192.168.4.3 -> Retransmission of Fragment 3 of HTTP 200 OK response (PSH) -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 1 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 1 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 1 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 1 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 1 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 1 of HTTP 200 OK response -> 192.168.4.100

192.168.4.100 -> ACK of Fragment 1 -> 192.168.4.3

192.168.4.3 -> Retransmission of Fragment 2 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 3 of HTTP 200 OK response (PSH) -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 2 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 2 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 2 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 2 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 2 of HTTP 200 OK response -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 2 of HTTP 200 OK response -> 192.168.4.100

192.168.4.100 -> ACK of Fragment 2 -> 192.168.4.3

192.168.4.3 -> Retransmission of Fragment 3 of HTTP 200 OK response (PSH) -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 3 of HTTP 200 OK response (PSH) -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 3 of HTTP 200 OK response (PSH) -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 3 of HTTP 200 OK response (PSH) -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 3 of HTTP 200 OK response (PSH) -> 192.168.4.100
192.168.4.3 -> Retransmission of Fragment 3 of HTTP 200 OK response (PSH) -> 192.168.4.100

192.168.4.100 -> ACK of Fragment 3 -> 192.168.4.3

This is actually a big problem, because there are about 40 seconds between the GET request and the last ACK which coincides with the moment the application (telnet in this case) gets the data.

I have checked all the checksum and they are correct...

So I actually don't know why this happens and what to do! I have tried with different OS as hosts (a Windows 8 mobile phone, a MAC OSX, a Ubuntu 14.04, ...), but nothing changes. If I send the same request from another docker of the virtual network, everything works fine.

Any idea about what the problem could be?

Thanks!

PS here you can see a screenshot of the capture:

enter image description here

Update

One thing I think can be interesting is that I have made an analogous capture, but when a HTTP request is sent from 192.168.4.3 to 192.168.4.100. The capture is taken again on the 192.168.4.100 interface and it seems again that 192.168.4.100 ignores the packets it receives (look at the three way handshake for example). And I found no reason for this again.

enter image description here

mgaido
  • 2,987
  • 3
  • 17
  • 39
  • 1. Where are you capturing? At the server or the client? 2. What's the MTU between the machines? 3. What are the sizes of each of the fragments? 4. What is the size of the response? – Malt Nov 03 '14 at 14:57
  • 1 - As I have said, I am capturing on the interface of the host 192.168.4.100; 2 - the MTU is 1454 because there is a GRE tunnel between them; the fragments have 1454 as length, but the last which is 327 bytes long. – mgaido Nov 03 '14 at 15:05
  • Could you post the entire capture someplace? I'd like to look at the timings and some TCP/IP headers... You can remove the HTTP payload if it contains anything proprietary. It's just that the capture looks weird if it really was captured on the client side. – Malt Nov 03 '14 at 15:11
  • I have edited my post with a screenshot of the capture – mgaido Nov 03 '14 at 15:18
  • I guess 192.168.4.3 is the IP addr of a Docker container, how do you run your Docker containers? Do you use the `--net` option of the `docker run` command? – Thomasleveil Nov 03 '14 at 15:29
  • You guess right, I run them with Openstack and nova-docker. The actual network configuration is rather complex, but I can ping it without problems and with the captures I see that everything is going out from the Docker is received on my host interface and it is what you see in the screenshot I have posted. – mgaido Nov 03 '14 at 15:32
  • Could you post a screenshot with the TCP sequence numbers (of the original packets, not the retransmissions)? Or even better, post the actual pcap file? It's curios that the first retransmission is of the last segment sent (the small one), 200ms before the retransmissions of the first (large) segment. – Malt Nov 03 '14 at 16:00
  • Sorry but there is not only this traffic in that capture so I can't share it. However, the sequence is the Seq and Ack fields have the values you can see in the screenshot (the first ones has the same value of the retransmissions...). However I looked at Seq numbers very carefully and I have seen nothing bad about them. – mgaido Nov 03 '14 at 16:17
  • Could be reverse path filtering? – Bryan Nov 05 '14 at 16:53
  • Sorry, what is reverse path filtering? How can I test if this is the solution? Thanks. – mgaido Nov 05 '14 at 16:54
  • For example http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html You would test this by turning it off, which is a setting in /proc/sys/net/ipv4/conf/*/rp_filter - there is one file for each network interface and one called 'all' So, suppose your network interface is called "foo", then you would write a '0' into /proc/sys/net/ipv4/conf/foo/rp_filter and into /proc/sys/net/ipv4/conf/all/rp_filter – Bryan Nov 05 '14 at 17:07
  • I have a MacOSX, I can't run that command, because it says that those file and directories don't exist. When I understand how to do this, I will try. Thanks. But it sounds strange to me, because I have only one interface set up with an IP address... – mgaido Nov 05 '14 at 17:14
  • If you are running Docker on the Mac, then presumably you are running Boot2Docker, which runs Docker on a Linux VM using VirtualBox? So do `boot2docker ssh` then proceed as above. If you are running Docker on some other machine, please give more details. – Bryan Nov 05 '14 at 17:46
  • No, the docker is running on a Ubuntu 14.04 (and it is the host 192.168.4.3), while the other endpoint I am using (192.168.4.100) is a Mac OSX. I am connecting via WiFi on an access point to an host which is connected with an Ethernet cable to the host on which the dockers are running. The actual structure is rather complex, but I don't think it's worth to explain it, because the packets indeed are received by my Mac OSX (or the other devices I tried to use). Simply it ignores some of them. – mgaido Nov 05 '14 at 18:08
  • Ah, sorry, I thought it was the other way round. The analogous functionality on OSX is the Packet Filter - see man pfctl and pf.conf, although I believe it is off by default. – Bryan Nov 06 '14 at 09:57
  • I looked at them but I have found nothing about Reverse Packer Filtering... I think it does not exist or can't be configured on Mac OSX. – mgaido Nov 06 '14 at 10:03
  • It's called "reverse path filtering". Check under "urpf-failed". – Bryan Nov 06 '14 at 11:54
  • Sorry, I was wrong writing it. However, I am capturing always on the same interface. This issue seems to be related to packets which come from an interface different from the one is expected. But I have only one interface for the traffic going out and in and I am capturing on it. So packets come from the same interface they go out. Then RPF should not be the issue, shouldn't it? – mgaido Nov 06 '14 at 12:06
  • However I have no `urpf-failed` drop rule on my Mac OS X. – mgaido Nov 06 '14 at 13:18
  • One reason could be if the docker bridge is not properly sending the data. You can check the MTU of the docker bridge, and check that the MAC of each part are properly handled in the bridge: brctl showmacs br0 – Jon Ander Ortiz Durántez Nov 12 '14 at 08:22
  • Sorry, but I don't understand what you mean: the capture I posted is made on the host which ignores the packets: it actually receives all the packets. And all them have the MAC address of the Docker which generates them. There is no MTU issue, because if it were, packets would not be received, but they actually are. – mgaido Nov 12 '14 at 08:30

1 Answers1

2

I managed to solve my problem. I post here the solution which can be useful if someone has my same problem.

The problem was that I disabled the TSO (tcp-segmentation-offload) on the virtual bridge to which my Dockers are attached with the command:

ethtool -K IFACE_NAME tso off

It turns off only TSO, whereas the checksumming offload remains on. Evidently, this creates some problem and though Wireshark showed me that TCP checksum was OK, actually it wasn't. So the host ignored the packet due to the bad TCP checksum.

To turn off TSO and checksumming too, I just used the command:

ethtool --offload IFACE_NAME rx off tx off

And now everything works.

mgaido
  • 2,987
  • 3
  • 17
  • 39