1

The need has arisen for a clustered server setup (is that what it's called?) in my company. We have our hosting rented abroad, and as such have limited access to the actual hardware, but we have total freedom and are not restrained by financial resources (provided we avoid overkill, of course - no need for 300 servers if 3 can handle things).

We are an international online publisher serving free online readable books. This means we have a ton of static content - primarily many, many gigabytes of flash documents. We recently went and upgraded the server OS to CentOS x64, and changed the server software from Apache to Nginx(for static content)+Apache. There were some problems, however, and we faced some unexpected downtime, which damaged us pretty severely, even if it was only for a couple hours.

My thoughts on a cluster setup were as follows:
- server 1: our current MySQL database.
- server 2, server 3, server 4: our Application, that is, our PHP code on Apache
- server 4: static content only (images from 5kb to 3mb, PDFs from 5mb to 100MB, flash files from 200kb to 20MB, etc..) powered by Cherokee

I believe this setup would help us avoid downtime should one of the three application servers fail, in addition to sharing the load among three servers unlike now when everything (static + DB + application) was on one machine.

What I would like from you veterans is some helpful links about server load sharing, hints and tips regarding this issue and my proposed setup above.. I have limited experience with Apache as a PHP developer, and not much more, so if anyone can offer any valuable insight into their setups or experiences with different hardware/software, I would be much obliged.

Also, what is the correct terminology? Cloud? Cluster? Any other terms I should be aware of. Please be gentle, I'm only beginning to tread into the server world.

Thank you

Edit: new plan is as follows, please let me know what you think:

Application Cluster:

  • 3 servers running Nginx (or Cherokee) and Apache with PHP. Nginx would handle requests for static content on the same server (CSS, JS, thumbnails, sprites, images)
  • Since we currently have 2 web sites with rather large traffic (one high on DB updates, the other high on static content serving), we were thinking of putting both on this application server.
  • The two applications would have two load balancers to distribute traffic among the three servers. The servers would be identical clones, and easily scalable later on.

Database Cluster

  • Two servers running MySQL, clones. Load balancer. Backups would be done on themselves, as it is highly unlikely both would die at the same time. Both applications on App cluster will use this cluster - one will perform an average read load, the other a high read-write load.

Static Cluster

  • Two servers with static content exclusively, basically just storage for thousands of PDFs, Zips and Flash files. No backup, impossible to perform efficiently. Servers are each other's backup. This static cluster will serve larger static content for both applications on the App cluster.

Is this realistic? What would you advise against, if anything? What would you add?

Swader
  • 499
  • 2
  • 5
  • 18

2 Answers2

2

A few general things that I've learned over the years:

  • See this question for a list of good books on the subject of performance, scaling and high availability sites.
  • "Cluster" is the correct term. You're using multiple machines to serve one site in an attempt to increase availability. You can also use cluster to refer to specific portions of your setup: for example servers 2+3+4 would be your application cluster.
  • Is there any reasons why you only have redundancy on the application level? What about MySQL and static content? Especially since your static content is relatively large look at how much bandwidth you can serve to N concurrent users if needed. What happens if the MySQL server fails or if server #4 has a bad disk?
  • If you're moving everything from one machine start off small unless you don't mind spending more than you need. For example, I found a larger than expected performance gain in a similar situation moving from 1 to 3 servers. After you split into multiple servers you may find the new bottleneck is in a different area.
  • As you plan for scaling now don't completely forget about possible future scaling. A little forward thought and design now can save you time in the future. For example, you have one static server now but what you want multiple in a year, or several servers spread out geographically.
  • Consider creating scripts to help setup specific types of servers...doing it manually each gets old and you always forget one step. I did this recently and wish I had done it from the start. Running one script that does 50 install steps automatically in a few minutes saves you much time in the long run.
  • As you get more servers the likely-hood of experiencing some sort of hardware failure becomes higher. Plan for this and play the what-if game: What if the hard drive failed on server X? What would we lose? How long would the site be out? How long would it take to fix it? etc...
uesp
  • 3,414
  • 1
  • 18
  • 16
  • Great stuff, thank you! The MySQL book from that list has been ordered, and the advice is top notch. We'll keep future scaling in mind, and will analyse bottlenecks thoroughly, yes. Scripts for adding nodes will be done as well, our server admin has a habit of doing them. As per point 3, it's not redundancy as much as avoiding downtime. Load balancing was supposed to be an unintentional perk, but your advice set some additional research in motion on my end. I'll post back with some findings and opinions :) – Swader Mar 22 '11 at 08:18
  • @uesp I added a new clustering plan into my original question. Would you mind taking a look and letting me know how feasible it is? – Swader Mar 22 '11 at 08:51
  • Note that cluster design for everyone will be different depending on their application and traffic so its hard/impossible to say if N or M servers of type X are better. I assume by MySQL "clones" you mean replication. Make sure you understand the difference between master-slave and master-master. I would also consider some sort of dedicated backup, preferably in a different location than the 7 main servers. Consider the worst case of a fire/disaster in the data center...unlikely but good to consider. – uesp Mar 22 '11 at 11:45
  • Yes, I meant replication. I have basic knowledge of master-slave terminology in clustering, but the true benefits and drawbacks elude me. I will read on. We will not be setting up a geographically removed backup just yet, but you do raise a valid point and it is something to consider eventually. How about having 2 applications on one app cluster? Is that feasible or impractical? The servers will be strong and the apps are more or less optimized, so there should be no struggle for resources. – Swader Mar 22 '11 at 11:55
  • There's nothing technically wrong with running 2 or 100 apps on the same server/cluster. It just all comes down to how much resources the apps take to run. The nice thing about have a load balanced cluster is that whenever you need more capacity you just add another server....problem solved. Most of the work is the initial planning/design/setup of the cluster. One thing I didn't mention but DerfK touched on is monitoring/benchmarking. Its always good to know your servers current usage and capabilities for planning. – uesp Mar 22 '11 at 12:02
  • Understood. Thanks for all your help, on both my questions, lots of stuff is much clearer now. – Swader Mar 22 '11 at 16:42
2

I think uesp covered the general stuff pretty well. To decide what you are going to do for your case, there are a couple of things you need to sit down and think about:

  1. What is the current load on each of these components? What is the projected future load?
  2. What are the failure scenarios you want to deal with? What caused your last failure?

The first questions tell you the minimum number of servers you'll need at each level in order to have a running site.

The second questions tell you how much hardware you'll actually want to have in order to ensure your site keeps running. As you lay out the failure modes, you'll find that you're going to need to consider more than just servers: firewalls, upstream internet connections, generators, physical locations and more. You'll also need to address things like having administrators on call to deal with servers crashing at 3AM and the monitoring needed to wake up the administrator and let them know something has crashed. If your failure before was due to a configuration or programming error, consider a staging environment between development and production for testing to take place after the programmers are done with their changes and before the changes go live.

DerfK
  • 19,493
  • 2
  • 38
  • 54
  • Our last failure was caused by the hosting company which decided to switch out server hardware without telling anyone in the middle of the work day. We had no backup, no message for our users, nothing. The admins on call are handled, warnings and problems as well. About the future load though, we're not sure. The company has exploded exponentially in the last year, and we can't even nearly predict the amount of traffic we'll have in another year's time. – Swader Mar 22 '11 at 08:21