0

My website loads more slowly than I think it should, due to a few of the assets taking an absurdly long time to download from the server. I've been trying to track down the cause of this. I'm about 95% sure it is a networking issue, not an Apache issue, due to the tests I've done (see below).

Here's a screenshot from Firefox's network inspector. Note that the stuck assets are usually some of these images, but it has occurred on other assets like Javascript files, etc.

Hypothesis and Question

My current theory is that our colo's bandwidth limit is causing packet loss when the browser downloads resources in parallel, perhaps momentarily above the bandwidth limit. Is this a sensible theory? Is there anything we can change apart from requesting more bandwidth, even though we don't use most of the bandwidth most of the time?

Or, is there some other avenue I need to be researching?

Configuration

  • Apache 2.4.3 on Fedora 18, CPU and memory with lots of available capacity.
  • Gigabit Ethernet to a switch to a 4 or 5 Mbps uplink via the colocation facility.
  • It isn't a very high traffic site. Rarely more than a couple visitors at once.

Tests I've Done

  1. traceroute to the server is fine. traceroute from the server to, say, our office does stop after 8 or so hops. I'm hypothesizing that this is due to traceroute traffic getting blocked somewhere (since things like wget—see below— and ssh seem to largely work fine), but I can provide more details if this is pertinent.
  2. strace on Apache indicated that the server immediately was serving up the entire image without delay.
  3. tcpdump/wireshark showed that the image data was sent immediately, but then, later, some packets were retransmitted. One trace in particular showed that the final packet of the asset was transmitted immediately by the server, retransmitted several times, but the original packet was the one the browser finally received.
  4. While I could sometimes reproduce the problem downloading the page via wget, it didn't happen as regularly as it did in the browser. My hypothesis is that this is because wget doesn't parallelize downloads.
  5. Testing with iperf was interesting. Using iperf's UDP mode, I found that I had next to no packet loss at speeds up to about 4 Mbps. Above that, I began seeing ~10% packet loss. Similarly, in TCP mode, small numbers of parallel connections split the bandwidth sensibly between them. On the other hand 6 or more parallel connections started getting a "sawtooth" bandwidth pattern, where a connection would sometimes have bandwidth and not other times.

I'd be happy to provide more details on any of these, but I didn't want to crowd this post with details not pertinent. I'm hardly knowledgeable enough in networking to know what information is useful and what isn't. :-D Any pointers to other good network-troubleshooting tools would be swell.

EDIT 1: Clarified my near-certainty that Apache isn't to blame, but rather networking something-or-other.

EDIT 2: I tried iperf between this server and another of ours on the same gigabit switch, and got a pretty consistent 940+ Mbits/s. I think that rules out most of the hardware problems or duplex mismatches on our end, except perhaps the uplink.

EDIT 3: While the specifics are very different, this post about a TCP incast problem sounds very similar, in terms of having high-bandwidth traffic shuffled down a narrow pipe in small bursts and losing packets. I need to read it in more detail to see if any of the specifics apply to our situation.

2 Answers2

0

Have you tried putting a proxy in front of Apache to cache the data? A popular solution for this is Nginx which will listen on say port 80 (which means you have to change Apache's listen port if it uses 80 as well).

Just set it up so Nginx serves static content like JS, CSS, images, etc... and pass everything else off to Apache via proxy.

I noticed when I did this for my site it improved performance quite a bit as Nginx was built to be a proxy or standalone server IIRC, while Apache was more so a fork of a previous web server back when proxying wasn't very popular (if even thought of).

ehansen
  • 48
  • 1
  • 3
  • Thanks, but I'm pretty sure `strace` demonstrated conclusively that Apache wasn't the bottleneck. If we had higher traffic, this would be the wat to go, for sure. – David Alan Hjelle Jan 21 '14 at 00:28
  • Thing is tho I didn't say apache was, just that this would help take some stress off of apache. – ehansen Jan 21 '14 at 04:25
  • Fair enough. ;-) Really, I'd jump for nginx or some other cache in a heartbeat if Apache was under any sort of load at all. But we're only getting ~1,000 visits a day, which Apache should handle without breaking a sweat. Thanks again, though! – David Alan Hjelle Jan 21 '14 at 14:10
0

Finally found the culprit. Our colo was doing traffic shaping on our connection—when they turned it off, the problem disappeared completely. I expect we will be doing further work to narrow things down in their configuration, but it was, happily, not our configuration.