Up until now I have been using a CentOS 5.x PC with 2x NICs running iptables as a router for my network. It worked great but recently I decided to get a DreamPlug which runs Debian 5.0.3/kernel 2.6.33.6 and use it to replace my CentOS router. I copied over my iptables config and setup the interfaces the same way and then switched them out.
Everything seemed to work just fine, but then I noticed my tcp sessions were consistently hanging anywhere between 1 to 10 seconds after the connection was initialized. This caused websites that couldn't load almost immediately to get stuck loading. File downloads ran for a couple seconds and then halted indefinitely. On a couple different occasions the transfers resumed, but it was only for another couple seconds and then it stalled again.
At this point I replaced my iptables config with a new bare-bones NAT config ( http://pastebin.com/raw.php?i=bhLHk2wh ) to rule out any firewall configuration issues. I did a test with wget on dozens of different websites (GET /) and also downloading iso files from a handful of different mirrors. The issue was consistently reproducible no matter where I was downloading from. I captured a tcpdump for each of the three interfaces the data passed through before leaving my network: internal host NIC, Firewall Internal NIC, Firewall External NIC. There were not any differences between the packets on each interface (that I could tell). None of the packets had been blocked by the firewall, verified by iptables logs.
Here is the wget output for this test: http://pastebin.com/raw.php?i=qyXtE2rJ
I'm not a tcp expert so my analysis may prove to be elementary but I found that the tcp session is setup properly. Some P packets get sent and acknowledged and all of the sudden packets start missing.
Here is the dump for the external interface (eth0) on the firewall: http://pastebin.com/raw.php?i=q73b1rXZ
There are a few duplicate ACKs for seq 3655108323 while the remote host appears to be still sending data that isn't acknowledged. Then the R flag is sent and the connection hangs for five minutes at 16:30:32.310469 before I cause the session to terminate by interrupting wget. It is also worth noting that during my testing I would see this hang behavior start in two different ways.
- The R flag would be sent from the firewall and then no further packets from the remote host were received.
- A flagless ack would be sent from the firewall and then no further packets from the remote host were received.
The only other potential issue I see is the 272 packets were dropped by the external fw interface. I'm a bit puzzled by this because the firewall has downloaded files at 100Mbps (internally) without breaking a sweat. These small connections are trivial, there shouldn't be any dropped packets. Additionally, I can download files very quickly when running wget from the firewall itself. I get consistent speeds of over 1MBps (over the internet). Here is a snipped version of the dump: http://pastebin.com/raw.php?i=Fb9zhqh4
Here is the dump for the internal interface (eth1) on the firewall: http://pastebin.com/raw.php?i=TuM4sTxB
Nothing appears to be different on the internal interface and no dropped packets.
Here is the dump for the interface (en0) on my internal host (OS X): http://pastebin.com/raw.php?i=SSXHFqVf
Something I've noticed on this host is that the checksums for outgoing frames are almost always wrong. This happened before with the CentOS router but since it didn't seem to be negatively affecting anything I figured it must be incorrectly checking the sums. If anyone knows what is causing this, I am interested in finding out.
In conclusion, It seems like there is some PL going on but I can't pinpoint the cause. The external fw dump makes me think that the problem exists on eth0 but then I can wget files just fine on the firewall itself (traffic only goes through eth0) so that can't be it. Does anyone have any suggestions for additional troubleshooting steps I can take to narrow down the potential causes here?