1

I'm trying to estimate a number of concurrent users a WordPress AWS setup can withstand (that needs to be highly available and support huge loads). I am asked for a loose range that I would say we can guarantee (they asked the guy that's new to DevOps...).

The architecture looks as follows:

  • Two RDS r5.2xlarge instances (Main + Read replica) for the DB.
  • An autoscaling group managing 1 to 25 t2.2xlarge EC2 instances.
  • CloudFront as a CDN for the content.

Some conditions about the application:

  • There are around 1300 publications.
  • Each page weights between 200 Kb and 3 Mb (very rare), being mostly around 500 Kb.

Obviously the aim is to have "reasonable" response times, although I haven't been told of a range.

By concurrent users I mean concurrent petitions or hits. I would love to actually test it, but unfortunately I need a lot of paid resources.

I don't really know which is the right reasoning to apply here. The most helpful and related thing I've found so far is this - however, the conditions are quite different and we can't really take the numbers and linearly scale them. The author of the post has measured that with 18 t2.medium instances running at 60% it is possible to run WP with 90 RPS and keeping response times of approximately 350ms. Extending this conclusion to my architecture escapes my understanding.

Ideally, besides an answer to my problem, I'd like to get a method for coming to valid answers to these questions.

  • 2
    You can't estimate this, you have to benchmark it. Note that PHP / Wordpress is quite CPU intensive, if you can cache your pages for anonymous users even for a short time in such a way that you never hit Wordpress you should significantly reduce CPU requirements. I use Nginx page caching, but you can use any solution. 25 t2.large is pretty big, sounds like thousands of pages per second. You'll want to look at your architecture, perhaps distribute across two regions to get really high availability. Wordpress needs a shared file system for wp-content. – Tim Oct 07 '18 at 23:03

1 Answers1

1

Usually we don't get asked ask how many users a given architecture can withstand, instead we get asked what architecture we need to withstand a given number of users. Isn't that a more important question?

Anyway a couple of notes:

  • If you design your architecture to be truly scalable at all the layers - content delivery (Cloud Front), web servers (stateless, disposable), file storage (EFS, S3), database (e.g. Aurora, read-only replicas) - then you don't have to care too much about how many users a particular configuration can support. If the demand is lower or higher the architecture will simply scale to meet the demand.
  • Your architecture seems to be on the right path for scalability so I guess the best way is to stand up a proof of concept and do a professional load testing. There are companies that can do that from geographically dispersed locations. That will show you how your design performs and from there you will be able to interpolate the various configurations needed for various concurrent user numbers.

  • A word of warning about T2 instance types - they use so called CPU credits which makes them run fast for a short period of time and then they slow down. When they're idle they accumulate these credits again and for some time can run faster again. That's great for workload that comes in spikes, but for a sustained load you'll be better off with e.g. M5 instances types (e.g. m5.large) - these offer consistent performance.

  • It's better to have bigger number of smaller instances (e.g. 20x m5.large) instead of a smaller number of big instances (e.g. 5x m5.2xlarge) - scaling in and out is smoother, disk performance is better, failure of a single node doesn't have such a big impact, etc.

  • Consider using Spot instances and Spot fleets for extra savings on your instance costs.

  • You mentioned you serve some publications - if these are static PDF documents you'll be much better of storing them in S3 and have CloudFront read them directly from S3 without going through your WordPress at all. If they are not public and need e.g. a subscription look at CloudFront / S3 signed URLs. That will significantly reduce the load on your WordPress servers.

Hope that helps :)

MLu
  • 24,849
  • 5
  • 59
  • 86
  • Thank you for your detailed explanation. I should have made clear the fact that in this case I am not in charge of designing an architecture; it is in place already and I will mantain it in the future. For this moment, I am asked to give an approximate range of concurrent RPS we can support - an "informed opinion". Best I have so far is an attempt of paralellizing the cases mentioned in the blog post I reference in my question, considering how many vCPUs are available, usage %, number of instances... Averaging everything, I'd give a range of 600-800 RPS. As I said, it's the best I have so far. – Alexander George Oct 07 '18 at 23:54
  • As for the word _publications_, I meant _posts_. A tiny false friend from my mother tongue, I apologize :) – Alexander George Oct 08 '18 at 00:02
  • @AlexanderGeorge if it's already existing and in use you probably have some performance data at hand to support your estimates, don't you? Perhaps get the load from CloudWatch monitoring and the number of concurrent users from ELB access logs and correlate these two to make some pretty charts for the management? :) – MLu Oct 08 '18 at 00:03
  • I'm still waiting to have access. Meanwhile, I just thought of something. My proposed 600-800 RPS equals a theoretical 1.814.400.000 requests per month. To me, this sounds absolutely insane. Do you have any opinion on this? – Alexander George Oct 08 '18 at 00:27
  • You probably shouldn't multiply the peak maximum by 24x7 to get the monthly visitors. More likely than not you won't see sustained traffic all the time every day. – MLu Oct 08 '18 at 00:32
  • Explain to the management that without having access to the performance metrics you are unable to comment on the number of users the system can sustain. And that once you've got access you'll provide a better informed estimate. And that from what you could learn so far it looks like the system is well scalable and should be able to adapt to increasing traffic in the future. Otherwise you're just making up pointless numbers ;) – MLu Oct 08 '18 at 00:36