0

Up until now I have been using a CentOS 5.x PC with 2x NICs running iptables as a router for my network. It worked great but recently I decided to get a DreamPlug which runs Debian 5.0.3/kernel 2.6.33.6 and use it to replace my CentOS router. I copied over my iptables config and setup the interfaces the same way and then switched them out.

Everything seemed to work just fine, but then I noticed my tcp sessions were consistently hanging anywhere between 1 to 10 seconds after the connection was initialized. This caused websites that couldn't load almost immediately to get stuck loading. File downloads ran for a couple seconds and then halted indefinitely. On a couple different occasions the transfers resumed, but it was only for another couple seconds and then it stalled again.

At this point I replaced my iptables config with a new bare-bones NAT config ( http://pastebin.com/raw.php?i=bhLHk2wh ) to rule out any firewall configuration issues. I did a test with wget on dozens of different websites (GET /) and also downloading iso files from a handful of different mirrors. The issue was consistently reproducible no matter where I was downloading from. I captured a tcpdump for each of the three interfaces the data passed through before leaving my network: internal host NIC, Firewall Internal NIC, Firewall External NIC. There were not any differences between the packets on each interface (that I could tell). None of the packets had been blocked by the firewall, verified by iptables logs.

Here is the wget output for this test: http://pastebin.com/raw.php?i=qyXtE2rJ

I'm not a tcp expert so my analysis may prove to be elementary but I found that the tcp session is setup properly. Some P packets get sent and acknowledged and all of the sudden packets start missing.

Here is the dump for the external interface (eth0) on the firewall: http://pastebin.com/raw.php?i=q73b1rXZ

There are a few duplicate ACKs for seq 3655108323 while the remote host appears to be still sending data that isn't acknowledged. Then the R flag is sent and the connection hangs for five minutes at 16:30:32.310469 before I cause the session to terminate by interrupting wget. It is also worth noting that during my testing I would see this hang behavior start in two different ways.

  1. The R flag would be sent from the firewall and then no further packets from the remote host were received.
  2. A flagless ack would be sent from the firewall and then no further packets from the remote host were received.

The only other potential issue I see is the 272 packets were dropped by the external fw interface. I'm a bit puzzled by this because the firewall has downloaded files at 100Mbps (internally) without breaking a sweat. These small connections are trivial, there shouldn't be any dropped packets. Additionally, I can download files very quickly when running wget from the firewall itself. I get consistent speeds of over 1MBps (over the internet). Here is a snipped version of the dump: http://pastebin.com/raw.php?i=Fb9zhqh4

Here is the dump for the internal interface (eth1) on the firewall: http://pastebin.com/raw.php?i=TuM4sTxB

Nothing appears to be different on the internal interface and no dropped packets.

Here is the dump for the interface (en0) on my internal host (OS X): http://pastebin.com/raw.php?i=SSXHFqVf

Something I've noticed on this host is that the checksums for outgoing frames are almost always wrong. This happened before with the CentOS router but since it didn't seem to be negatively affecting anything I figured it must be incorrectly checking the sums. If anyone knows what is causing this, I am interested in finding out.

In conclusion, It seems like there is some PL going on but I can't pinpoint the cause. The external fw dump makes me think that the problem exists on eth0 but then I can wget files just fine on the firewall itself (traffic only goes through eth0) so that can't be it. Does anyone have any suggestions for additional troubleshooting steps I can take to narrow down the potential causes here?

polynomial
  • 4,016
  • 14
  • 24
Vye
  • 106
  • 4
  • Regarding the interface drops, I'm assuming the external interface is seeing traffic for MACs it isn't responsible for. Getting tcpdump of all traffic on there with -e would confirm that. Also if you switch physical interfaces on the router and do wget tests to the router is everything still ok? (verifying that both interfaces are fine) – polynomial Oct 05 '11 at 03:47

1 Answers1

0

TCP connections that hang may mean that you have a MTU issue along the way. Most likely somewhere on your path there is a device that blocks ICMP fragmentation needed packets.

You can find instructions how to circumvent this here: http://www.netfilter.org/documentation/HOWTO/netfilter-extensions-HOWTO-4.html#ss4.7.

I would recommend to determine the maximum MTU that you can use

ping -M do -s MTU REMOTE_IP

(change the MTU from 1500 downwards until you find a value that works) and then

iptables -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss MTU
ciupinet
  • 142
  • 3