0

The company I work for has an event coming up in the near future that we believe will generate traffic around 3-4x anything we have dealt with in the past and I am unsure of the best way to deal with such a huge and sudden increase. Several months ago we had an event that generated about 15,000 sessions for the day and our systems essentially buckled, it took our customer service team two weeks to sort out the mess from all of the orders that had been dropped or improperly completed.

We have increased our hardware specs significantly since then but I would be impressed if we could handle 25,000 sessions without any issues. We've already maxed out our SQL server for our cloud computing host and we plan to spin up an overkill amount of web servers. So given that we are expecting 45,000-60,000 I am wondering if I should start planning for the worst.

My thinking is that I want to limit the number of connections allowed to a lower number that I know we can handle and then ramp it up from there until I start to see cracks and then hold it steady. I thought about doing this through our load balancer but it seems that it doesn't support it which means I will have to configure it on the individual servers.

TLDR:

Here are the essential questions

  1. Is limiting traffic to prevent a meltdown even a valid idea? Would any good admin do this?
  2. Can a load balancer limit connections to each server? If not is the best way to do this by limiting it on each web server?
  3. If I limit each web server to 200 connections, what happens when the load balancer tries to send the 201st connection? Does it get dropped, or redirected to another server with less connections?
  • 2
    Questions 2 and 3 are predicated on the answer to question 1. Is this traffic related to a product launch of some type? If so, are you expecting customers to access the website during this product launch in order to purchase the product? If so, then limiting traffic to the web servers seems like a good way to alienate your potential customers and drive them somewhere else, which is to say that this doesn't sound like a very good idea to me. – joeqwerty Sep 29 '17 at 22:35
  • Consumers (myself included) are impatient, finicky, and fickle. If your website doesn't load in about 1.5 seconds I'm already navigating away from it in search of another source for what I'm looking for. Artificially and intentionally limiting traffic to your site, thereby impacting your site load times, seems like a very good way to drive your potential customers into the arms and to the websites of your competitors. – joeqwerty Sep 29 '17 at 22:38
  • I agree with you, this isn't something I want to do. It just seems to me I can provide a good user experience for 1/3 of them or I can let them all on and they can get timeout errors for two hours. I guess it boils down to we CANNOT support as many people as are going to try to connect, so all we are left with a less than ideal options, if I could support all of them I would, trust me. – Colton Williams Sep 29 '17 at 22:47
  • I seen from big enterprise the too busy error from time to time, if there is too much visitor at the same time, but that mean you need to know your limit to not set a guessed's one – yagmoth555 Sep 29 '17 at 23:47
  • I am really naive, but isn't this what Amazon is for? Scaling anything you can in an instant? That is offload anything you can to AWS to limit your server load, or as mentioned max out what you can on a CDN. Also, is setting up some temporary dedicated hosted servers an option to add to your balancing pool? This would increase bandwidth as well if you could balance to outside of your network resources. Or a combination of... Again, sorry if I am being to naive! – Damon Sep 30 '17 at 05:16
  • @Damon our SQL server is the bottle neck and we are on a rack mounted server so unless we want to go with sharding (which I have 0 experience with) we only have the one server to work with. We will be spinning up probably a dozen or so extra IIS servers though. – Colton Williams Oct 01 '17 at 18:38

1 Answers1

2

First, I'd find out exactly what your server limits are so you have hard figures to work with. Apache makes a benchmarking tool for stress testing (of course I'm assuming you're using Apache, but other web servers have similar utilities) https://httpd.apache.org/docs/2.4/programs/ab.html

To help your systems overall, I would recommend making as much use of a CDN as you can. This can greatly reduce the total number of connections by offloading repeat requests for static elements to other servers.

Then, optimize the heck out of your caching server side. Varnish perhaps, though it greatly depends on the stack you are working with.

Adam
  • 196
  • 5
  • CDN does help with network congestion, but it does not help with server performance. Although in many cases network congestion can be an issue, it might be that it isn't so that a CDN isn't strictly needed. Although if the site has global audience, then CDN helps by serving static resources from closer servers in the network. – Tero Kilkanen Sep 30 '17 at 01:31
  • I think you might be right, I think my only option is going to be to look for all of the areas caching is either missing or poorly implemented and fix it. I don't think It will double our capacity but it certainly could help. I'll also look into benchmarking tools for MS SQL server to see if I can't get a hard number to work with, thanks. – Colton Williams Oct 01 '17 at 18:42