1

My site has been largely unresponsive for three days due to a large amount of traffic.

A JavaScript element on the page regularly requested images from the server, and the number of connections became a problem as people left their browser windows open (and therefore never stopped requesting the images).

I redesigned the site to use a single sprite and load it only once; however, now there's a massive amount of 404 errors as people still attempt to load the old content. The site is on a VPS, and it's unusable due to the latency.

To make matters worse, I had initially assumed the latency was due to a lack of caching. I added a directive to the .htaccess file for visitors to cache resources (the old, inefficient code included).

My host has been unable to rectify the problem. What can be done to force the persistent connections to stop trying to load obsolete content?

Peter
  • 111
  • 6

3 Answers3

3

You can't really stop someone from making a request to a non-existent resource (e.g. anyone can make up a URL for a page that doesn't exist and get a 404). However, there may be some things you can do to improve the situation.

Firstly, change the filename on your new content - if you still reference a javascript file in your html, ensure it has a different name than the one with the problem, so that the browser will not used the cached copy.

Secondly, make your 404 page as simple as possible (definitely go for a static page, not something dynamic, and very small).

Thirdly, Apache is not that efficient at handling large numbers of concurrent requests without a lot of available memory (it launches one worker thread per request). Consider (at least temporarily) adding another server in front of Apache that will handle the 404 requests more efficiently. Some examples might include:

  • nginx - have it serve the 404 requests (and possibly all static files), and proxy_pass other requests back to Apache (and it can also cache proxied requests)
  • Varnish - it can cache the 404 request and serve it directly from memory, reducing the load on Apache
cyberx86
  • 20,805
  • 1
  • 62
  • 81
  • Would banning the offending IP addresses on the server's firewall make a difference? – Peter Dec 22 '11 at 17:37
  • While you could block IP addresses - how would you know which ones (iptables doesn't process the http request in any way - it is only going to look at the port, source, and destination) - you can't block 'just the 404 requests' with iptables - you would need to know the IPs; and if you block any requests, you will block all requests from that IP (you can set up some limits, but will lose a lot of legitimate traffic). The firewall layer doesn't speak http - you need to work at the layer that does for an effective solution. – cyberx86 Dec 22 '11 at 21:09
1

I would start by immediately creating a zero-byte file to get rid of the 404's and minimize the cost of each retrieval.

David Schwartz
  • 31,449
  • 2
  • 55
  • 84
0

A system administrator at my host solved it with a script:

Any IP address that tries to access the old content (by nature, an such a user is continually making requests and consuming connections) is immediately added to the server's firewall. So far, just a few hundred (of many thousands of visitors) have been detected and blocked. The solution is working very well.

Peter
  • 111
  • 6
  • Seems a bit drastic, like cutting off your own nose. Couldn't you just redirect the users to a 404 page explaining the new layout? Also you really shouldn't change urls http://www.w3.org/Provider/Style/URI.html – dalore Feb 12 '15 at 14:57