6

I am a server administrator for a small start-up as a side venture (meaning I am by no means a well seasoned expert on the subject), and I recently helped move the site from a single windows machine to a cluster of machines on Rackspace Cloud.

Currently the site benchmarks at about 600 requests/second, but given the amount of resources we have allocated to it, I feel it could be much higher.

Right now we are using the Rackspace Cloud Load Balancer (Apache Zeus) in front of 8 web servers. Each web server is running Linux on a 512MB cloud instance and the content is being served by Varnish with an Apache 2 backend.

The web application itself is PHP. Apache is running in mpm-worker, and php is running in fcgi. PHP APC is enabled as well.

As for the database backend, I have two 4gb server instances serving MySQL in a Master-Master replication setup with half of the web servers pointing at each server. The application is quite database intensive, thus having so many resources dedicated toward database.

Performance is usually fine, however we have had some load spikes that the existing infrastructure was unable to handle, so I dynamically increased the size of the nodes. This worked out great, but I feel that under the specific load conditions we had, I had to throw a lot more resources at the infrastrcture than I had anticipated to keep the site up and fast. In my research it seems that we are using a very uncommon setup with respect to having so many separate instances of varnish, and I might need to explore the option of a caching layer.

An overview of the current architecture is drawn here (google docs link)

The pricing model of rackspace cloud is pretty linear, meaning a 1024mb server instance is exactly double the cost of a 512mb instance. As such, I am looking to maximize my performance while working within the same amount of resources (cost).

My initial thought is to remove the rackspace load balancer in favor of using a single instance of varnish in front of the apache backends, and perhaps make the apache backend be 4x 1gb instances rather than 8x 512mb instances. The cost of the load balancer is extremely cheap, so in order to justify replacing it with another dedicated server the performance gain would need to be large.

I have toyed around with the idea of HAProxy and Nginx as well, but I do not want to start blindly experimenting on a production site.

My goal is to be able to serve as close to 2000 req/s on roughly the same hardware allocation.

Edit: I had mod_pagespeed working for a while which put me up by about 100 req/s but I seemed to have a lot of issues with how it interacted with varnish.

Edit: Varnish VCL, Disk is Rackspace Cloud default (non-san, guessing SATA), Database is approximately 1.5gb currently. No swapping to disk under normal conditions. Apache processes are about 20mb each. php-cgi processes tend to chew up a lot more resources.

WerkkreW
  • 5,969
  • 3
  • 24
  • 32
  • 1
    We're going to need some more information on to help you out. Off the top of my head: How big is your MySQL DB? What is your disk layout on the DB servers? How much memory does your worker processes use? What are you configuration setting for varnish? Are your web and DB servers swapping to disk? Are there any errors in the log files? – Zypher Sep 09 '11 at 18:34
  • Database is about 1.5GB currently and growing. Disk layout is the default by rackspace cloud (I am unsure of what it is on backend but I assume something slow like SATA). Under normal conditions nothing swaps to disk, but under load spikes I have seen the servers thrash a bit. No errors. I will post an edit with the Varnish VCL. – WerkkreW Sep 09 '11 at 18:39
  • Have you analyzed where your bottleneck was? What servers were at limit? – Silent-Bob Sep 09 '11 at 18:44
  • When the infrastructure was under extreme load, the webservers would spike to a point where the only recourse was to reboot them and allocate more resources. Varnish would respond fine to requests resulting in a 503 error (no backend). So the bottleneck appears to have been apache (more specifically php) – WerkkreW Sep 09 '11 at 18:49

2 Answers2

2

OP, you can use http://blitz.io for some free bench marking. Also, look into the 'ab' and 'httperf' for some command line bench marking tools.

Varnish can be used with great success with minimal configuration. Also, if you use PHP heavy apps, I recommend installing APC.

Ramon Long
  • 21
  • 2
1

I would go with one high-RAM (check the varnish RAM usage using the varnish-tools and increase until fine) Varnish instances and no load-balancer (or two varnishes and load-balancer it you want high-availability) and as many apache servers as you need... if your app is CPU bound (more servers) or RAM bound (servers with higher MEM) is up to you.

Also playing with the cache settings (what can be cashed for how long) will help.

Silent-Bob
  • 1,066
  • 6
  • 9
  • Thanks for the answer. It is along the lines of what I'm considering doing but I don't want to go into blindly without knowing exactly what problem I am trying to solve. Sadly, though, I do not know enough about apache/varnish at a very advanced level to know where my bottleneck currently lies. – WerkkreW Sep 12 '11 at 14:59